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Abstract 

We revisit the problem of (1 ± e)-approxiniating the Lp norm, for real p with < p < 2. of 
a length-rt vector updated in a length-m stream with updates to its coordinates. We assume 
the updates are integers in the range [— M, M]. We prove new bounds on the space and time 
complexity of this problem. In many cases our results are optimal. 

1. We give a 1-pass space-optimal algorithm for Lp-estimation for constant p, < p < 2. 
Namely, we give an algorithm using 0{£~^ log(mM) + log log n) bits of space to estimate 
Lp within relative error e with constant probability. Unlike previous algorithms which 
achieved optimal dependence on 1/e, but suboptimal dependence on n and m, our algo- 
rithm does not use a generic pseudorandom generator (PRG). 

2. We improve the 1-pass lower bound on the space to ri(e~^ log(£^A'')) bits for real constant 
p > and l/^/N < e < 1, where N = min{n,m}. If p > 0, the bound improves to 
J7(min{A^, log(e^mM)}). Our bound is based on showing a direct sum property for 
the 1-way communication of the gap-Hamming problem. 

3. Forp = 0, we give an algorithm which matches our space lower bound up to an 0(log(l/e) + 
loglog(mM)) factor. Our algorithm is the first space-cfhcicnt algorithm to achieve 0(1) 
update and reporting time. Our techniques also yield a 1-pass 0((e^^ +log A^) loglog + 
loglogn)-space algorithm for estimating Fq, the number of distinct elements in the update- 
only model, with 0(1) update and reporting time. This significantly improves upon pre- 
vious algorithms achieving this amomit of space, which suffered from 0{£~^) worst-case 
update time. 

4. We reduce the space complexity of dimensionality reduction in a stream with respect to the 
L2 norm by replacing the use of Nisan's PRG in Indyk's algorithm with an improved PRG 
built by efficiently combining an extractor of Guruswami, Umans, and Vadhan with a PRG 
construction of Armoni. The new PRG stretches a seed of 0((S'/(log(S') — loglog(i?) -|- 
0(1))) log i?) bits to R bits fooling space-S* algorithms for any improving the 
0(5" log i?) seed length of Nisan's PRG. Many existing algorithms rely on Nisan's PRG, 
and this new PRG reduces the space complexity of these algorithms. 

Our results immediately imply various separations between the complexity of Lp-estimation in 
different update models, one versus multiple passes, and p ~ Q versus p > 0. 
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1 Introduction 



Computing over massive data streams is increasingly important. Large data sets, such as sensor 
networks, transaction data, the web, and network traffic, have grown at a tremendous pace. It is 
impractical for most devices to store even a small fraction of the data, and this necessitates the 
design of extremely efficient algorithms. Such algorithms are often only given a single pass over 
the data, e.g., it may be expensive to read the contents of an external disk multiple times, and in 
the case of an internet router, it may be impossible to make multiple passes. 

Even very basic statistics of a data set cannot be computed exactly or deterministically in this 
model, and so algorithms must be both approximate and probabilistic. This model is known as 
the streaming model and has become popular in the theory community, dating back to the works 
of Munro and Paterson [38] and Flajolet and Martin [18j . and resurging with the work of Alon, 
Matias, and Szegedy [2]. For a survey of results, see the book by Muthukrishnan [39], or notes 
from Indyk's course [26]. 

A fundamental problem in this area is that of norm estimation [2] . Formally, we have a vector 
a = (oi, . . . , a„) initialized as a = 0, and a stream of m updates, where an update (i, v) £ [n] x 
{— M, . . . ,M} causes the change Oj <— a.j + v. If the are guaranteed to be non-negative at all 
times, this is called the strict turnstile model; else it is called the turnstile model. Our goal is to 
output a (1 ± e)-approximation to the value Lp{a) = {Y17=i l"^*!^)^^^- Sometimes this problem is 
posed as estimating Fp{a) = Lp{a), which is called the p-th frequency moment of a. A large body 
of work has been done in this area, see, e.g., the references in [26[ 139] . 

When p = 0, Lq'= \{i \ Oi 7^ 0}|, and it is called the "Hamming norm". In an update-only 
stream, i.e., where updates (2,^) always have v = 1, this coincides with the well-studied problem 
of estimating the number of distinct elements, which is useful for query optimizers in the context 
of databases, internet routing, and detecting Denial of Service attacks [1]. The Hamming norm is 
also useful in streams with deletions, for which it can be used to measure the dissimilarity of two 
streams, which is useful for packet tracing and database auditing [14| . 

1.1 Results and Techniques 

We prove new upper and lower bounds on the space and time complexity of Lp-estimation for 
< p < 2 0. In many cases our results are optimal. We shall use the term update time to refer 
to the per item processing time in the stream, while we use the term reporting time to refer to 
the time to output the estimate at any given point in the stream. In what follows in this section, 
and throughout the rest of the paper, we omit an implicit additive log log n which exists in all 
the Lp space upper and lower bounds. In strict turnstile and turnstile streams, the additive term 
increases to loglog(nmM). Each following subsection describes an overview of our techniques for 
a problem we consider, and a discussion of previous work. A table listing all our bounds is also 
given in Figure [H 

1.1.1 New algorithms for Lp-estimation, < p < 2 

Our first result is the first 1-pass space-optimal algorithm for Lp-estimation, < p < 2. Namely, 
we give an algorithm using 0(e-Mog(mM)) bits of space to estimate Lp within relative error e 
with constant probability. Unlike the previous algorithms of Indyk and Li which achieved optimal 

^When < p < 1, Lp is not a norm since it does not satisfy the triangle inequality, though it is still well-defined. 
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Problem 


upper bound 


lower bound 


update 


reporting 


Lp 


0(e-2iog(mM)) 


n{e-^ log(mM)) 


0{e-') 


0(1) 


Lo (1-pass) 


0(e-^ (log(l/e) + log log(mM)) log N) 


f7(e-Mog7V) 


0(1) 


0(1) 


Lo (2-pass) 


0(e-^(log(l/e) + loglog(mM)) + logiV) 


f7(e-^ +logA^)* 


0(1) 


0(1) 




0{e-'^ loglog + log(l/e) log A^) 


n{e-'^ + logN)** 


0(1) 


0(1) 


L2 — > L2 


0{e-' \og{nM/{eS)) \og{n/{eS)) log(l/5)/ log(l/£)) 


rj(e-^log(nA/)) 




0(1) 



Figure 1: Table of our results. The 2nd and 3rd columns are space bounds, in bits, and the 1st row is 
for < p < 2. The last two columns are time. All bounds above are ours, except for * [51 [S] and ** 
[U [9l [28l |49l [30l [50] . N denotes min{n,m}. All lower bounds hold for e larger than some threshold (e.g., 
they never go above n{N)), and all bounds are stated for a desired constant probability of success, except 
for the last row. In the last row, 1 — 6 success probability is desired for 5 — 0(l/t^), where we want to do 
L2 L2 dimensionality reduction of t points in a stream, and thus need S — 0(1 /t^) to union bound for all 
pairwise distances to be preserved (the space shown is for one of the t points). Fq denotes Lq in update-only 
streams. For ***, the time is polynomial in the space. Note for rows 1 and 5, the reporting times are 0(1) 
since we can recompute the estimator during updates. 



dependence on 1/e, but suboptimal dependence on n and m |25[l32j. our algorithm uses only /c-wise 
independence and does not use a generic pseudorandom generator (PRG). In fact, the previous 
algorithms failed to achieve space-optimality precisely because of the use of a PRG [lO]. Our 
main technical lemma shows that /c-wise independence preserves the properties of sums of p-stable 
random variables in a useful way. This is the first example of such a statement outside the case 
p = 2. PRGs are a central tool in the design of streaming algorithms, and Indyk's algorithm 
has become the canonical example of a streaming algorithm for which no derandomization more 
efficient than via a generic PRG was known. We believe that removing this heavy hammer from 
norm estimation is an important step forward in the derandomization of streaming algorithms, and 
that our techniques may spur improved derandomizations of other streaming algorithms. 

To see where our improvement comes from, let us recall Indyk's algorithm [25] . That algorithm 
maintains r = 0(l/e^) counters Xj = X^iLiOjXjj, where the Xij are i.i.d. from a discretized 
p-stable distribution. A p-stable distribution P is a distribution with the property that, for all 
vectors a £ and i.i.d. random variables {Xj}"^-^ from T>, it holds that X^^Li ^j-^j ~ ll'^llp-'^) 
where X ~ 2?. His algorithm then returns the median of the \Xj\. The main issue with Indyk's 
algorithm, and also a later algorithm of Li [32], is that the amount of randomness needed to generate 
the Xij is Q{N/e'^). A polylogarithmic-space algorithm thus cannot afford to store all the Xij. 
Indyk remedied this problem by using Nisan's PRG [iQ], but at the cost of multiplying his space 
by a log(A^/e) factor. 

Our algorithm, like those of Indyk and Li, is also based on p-stable distributions. However, we 
do not use the median estimator of Indyk, or the geometric mean or harmonic mean estimators 
of Li. Rather, we give a new estimator which we show can be derandomized using only k-wise 
independence for small k (specifically, k = 0(log(l/e)/log loglog(l/e)) — any k = 0(l/e^) would 
have given us a space-optimal algorithm, but smaller k gives smaller update time). We first show 
that the median estimator of Indyk gives a constant-factor approximation of Lp with arbitrarily 
large constant probability as long as A;, r are chosen larger than some constant. Even this was 
previously not known. Once we have a value A such that ||a||p/j4 = 0(1), we then give an 
estimator that can (1 it e)-approximate ||a||p/A using only /c-wise independence. Despite the two- 
stage nature of our algorithm (first obtain a constant-factor approximation to \\a\\p, then refine to 
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a (1 lb e)-approximation), our algorithm is naturally implementable in one pass. 

Other work on Lp-estimation includes [21], though their scheme uses Q(e-2-Ppoly log(mM)) 
space. For p > 2, space polynomial in n is necessary and sufficient [21 [3 [Sj [TT| [29]. 

1.1.2 Tight space lower bounds for Lp-estimation 

To show optimality of our Lp-estimation algorithm, for p > we improve the space lower bound 
to ri(min{A^, log(e^mM)}) bits. For p = 0, we show a lower bound of log(e^A^)). Here, 

1/\/N < e < 1, with N = min{n, m}. The previous lower bound in both cases is r2(e~^ + log A^), 
and is the result of a sequence of work [21 [28l [49l [9] . See [MlllQ] for simpler proofs. Since Thorup and 
Zhang [37j give a time-optimal variant of the L2-estimation sketch of Alon, Matias, and Szegedy [2], 
our work closes the problem of L2-estimation, up to constant factors. Our bound holds even when 
each coordinate is updated twice, implying that the space of Feigenbaum et al. [17] for Li-difference 
estimation is optimal. Our lower bound is also the first to give a logarithmic dependence on mM 
(previously only an (log log (mM)) bound was known by a reduction from the communication 
complexity of Equality). 

Our lower bounds are based upon embedding multiple geometrically-growing hard instances for 
estimating Lp in an insertion-only stream into a stream, and using the deletion property together 
with the geometrically-growing property to reduce the problem to solving a single hard instance. 
More precisely, a hard instance for Lp is based on a reduction from a two-party communication 
game in which the first party, Alice, receives a string x G {0, 1}^ ^ , and Bob an index i G and 
Alice sends a single message to Bob who must output Xi with constant probability. This problem, 
known as indexing, requires Q.{e~'^) bits of space. To reduce it to estimating Lp in an insertion-only 
stream, there is a reduction [281 B9l [50] through the gap-Hamming problem for which Alice creates 
a stream Sx and Bob a stream Si, with the property that either Lp{Sx o Si) > e~^/2 -|- e~^/2, 
or Lp{Sx o Si) < e~^/2 — /2. Here, "o" denotes concatenation of two streams. Thus, any 
1-pass streaming algorithm which (1 =t e)-approximates Lp requires space which is at least the 
communication cost of indexing, namely, r2(e~^). 

We instead consider the augmented-indexing problem. Set t = Q[e~'^\og[e^N)). We give Alice 
a string x S {0, 1}* and Bob both an index i £ \t\ together with a subset of the bits Xj+i, . . . 
This problem requires ^{t) bits of communication if Alice sends only a single message to Bob [lll36j. 
Alice splits x into b = e^t equal-sized blocks Xq, . . . , X^^i. In the j-th block she uses the bits 
assigned to it to create a stream Sxj that is similar to what she would have created in the insertion- 
only case, but each non-zero item is duplicated 2-' times. Given i. Bob finds the block j for which 
it belongs, and creates a stream Si as in the insertion-only case, but where each non-zero item is 
duplicated 2-' times. Moreover, Bob can create all the streams Sx-, for blocks j' above block j. Bob 
inserts all of these latter stream items as deletions, while Alice inserts them as insertions. Thus, 
when running an Lp algorithm on Alice's list of streams followed by Bob's, all items in streams Sx-, 
vanish. Due to the duplication of non-zero coordinates, approximating Lp well on the entire stream 
corresponds to approximating Lp well on Sxj ° Si, and thus a (1 it e)-approximation algorithm to 
Lp can be used to solve augmented-indexing. For p > 0, we can do better by using the universe 
size to our advantage. Instead of duplicating each coordinate 2-' times in the j-th block, we scale 
each coordinate's frequency by 2^^^ in the j-th block. For constant p > 0, this has a similar effect 
as duplicating coordinates. Our technique can be viewed as showing a direct sum property for the 
one-way communication complexity of the gap-Hamming problem. 

For p 7^ 1, our lower bound holds even in the strict turnstile model. The assumption that p ^ 1 
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in the strict turnstile model is necessary, since one can easily compute Li exactly in this model 
by maintaining a counter. Also, as it is known that Lq can be estimated in 0(e~^ + logA^) bits 
of spac^ in the update-only model, our lower bound establishes the first separation of estimating 
Lq in these two well-studied models. Our technique also gives the best known lower bound for 
additive approximation of the entropy in the strict turnstile model, improving the Q{e~'^) bound 
that followcl from the work of [lO] to r2(e~^ log(A^)/ log(l/e)). Their lower bound though also holds 
in the update-only model. Additive estimation of entropy can be used to additively approximate 
conditional entropy and mutual information, each of which cannot be multiplicatively approximated 
in small space |27] . Variants of our techniques were also applied to establish tight bounds for linear 
algebra problems in a stream |13j . 

1.1.3 Near-optimal algorithms for Lq in turnstile and update-only models 

In the case of Lq, we give a 1-pass algorithm which is nearly optimal in the most general turnstile 
model. Our algorithm needs only 0(e~^ log(e^iV)(log(l/e) + loglog(mM))) bits of space, and has 
optimal 0(1) update and reporting time. Given our lower bound and a folklore O (log log (nmM)) 
lower bound, our space upper bound is tight up to potentially the log(l/e) term, and the log log(?nM) 
term being multiplicative instead of additive. Note our algorithm implies a separation between Lq 
estimation and Lp estimation, p > 0, since we show a logarithmic dependence on uiM is nec- 
essary for the latter. Our algorithm improves on prior work which either (1) both assumes the 
weaker strict turnstile model and uses an extra log(mM) factor in space [20], or (2) has space 
complexity which is worse by at least a min((log^ A^log^ m)/log(mM)), factor \14\ I21j. Also, 
all previous algorithms had at least a logarithmic dependence on mM, and none had 0(1) update 
time. Here we assume the word RAM model (as did previous work, except [6], for which we later 
translate their update times to the word RAM model), where standard arithmetic and bit oper- 
ations on r2(log(nmM))-bit words take constant time. Furthermore, we show that our algorithm 
has a natural 2-pass implementation using 0(e-2(log(l/e) +loglog(mM)) + logiV) space. Given 
our 1-pass lower bound, this implies the first known separation for Lq between 1 and 2 passes. 
Furthermore, due to a recent breakthrough of Brody and Chakrabarti [9] , our 2-pass algorithm is 
optimal up to 0(log(l/e) + log log (m A/)) for any constant number of passes. Finally, we give an 
algorithm for estimating Lq in the update-only model, i.e., the number of distinct elements, with 
0((e^^ + log A^) log log A^) bits of space and 0(1) update and reporting time. Our space is optimal 
up to the log log N □, while our time is optimal. This greatly improves the time complexity of 
the only previous algorithms (the 2nd and 3rd algorithmqj of [6] ) with this space complexity, from 
0(e-2) to 0(1). 

We sketch some of our techniques, and the differences with previous work. In both our 1-pass 
Lq algorithms (update-only and turnstile), we run in parallel a a subroutine to obtain a value 
R = @{Lq). We also in parallel pairwise independently subsample the universe at a rate of 1/2-' 
for j = 1, . . . ,log(e^A) (note that Lq < N) to create log(e^A^) substreams. This subsampling can 
be done by hashing into [N] then sending item i to level lsb(/i(i)), where Isb is the least significant 
bit. At each level j we feed the jth substream into a subroutine which approximates Lq well when 

2We say / = d{g) if / = 0(,g ■ polylog(<;)). 

^Their lower bound is stated against multiplicative approximation, but the additive lower bound easily follows 
from their proof. 

''Our gap to optimality is even smaller for e small. See Figure [T] 

'""Their 3rd algorithm has 0(log(l/e) + log log A'^) amortized update time, but 0(e^^) worst-case update time. 
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promised Lq is small. We then base our estimator on the level j with = 0(l/e^), since the Lq 
of that substream will be (1 it e)Lo/2-' with good probability, so that we can scale back up to get 
(1 lb e)Lo- The idea of subsampling the stream and using an estimate from some appropriate level 
is not new, see, e.g., [6l [20l [22l I43j . For example, the best known algorithm for Lq estimation in 
the strict turnstile model, due to Ganguly [20], follows this high-level approach. We now explain 
where our techniques differ. 

First we discuss the turnstile model. We develop a subroutine using only 0(log(A^) log log(mM)) 
space to obtain R. Previously, no subroutine using o(log(A^) log(mM)) space was known. Next, at 
level j we play a balls-and-bins game where we throw A balls into bins /c-wise independently 
for k = 0(log(l/e)/ log log(l/e)), then base our estimator on the outcome of this random process. 
This is similar to Algorithm II of Ganguly [20], which itself was based on the second algorithm 
of [6]. The A balls are the Lo-contributors mapped to level j, and the bins are counters. In 
Ganguly's algorithm, he bases his estimator on the number of bins receiving exactly one ball, and 
develops a subroutine to use inside each bin which detects this. However, this subroutine requires 
0(log(mM)) bits and only works only in the strict turnstile model. We overcome both issues by 
basing our estimator on the number of bins receiving at least one ball. To detect if a bin is hit, we 
cannot simply keep frequency sums since colliding balls could have frequencies of opposite sign and 
cancel each other. Instead, each bin maintains the dot product of frequencies with a random vector 
over a suitably large finite field. This allows us to both reduce the mM dependence to doubly 
logarithmic, and work in the turnstile model. Also, one time bottleneck is evaluating the A:-wise 
independent hash function, but we observe that this can be done in 0(1) time using a scheme 
of Siegel [l5] after perfectly hashing the universe down to [1/e^]. Furthermore, we non-trivially 
extend the analysis of [6] to analyze throwing A balls into bins with /c-wise independence 
when potentially A <C to deal with the case when Lq <C since then there is no j with 

Lq/2^ = 0(l/e^). The algorithm of [6] worked by estimating the probability that a single bin, say 
bin 1, is hit. Since their random variable had constant expectation, the variance was constant for 
free. In our case, the number of non-empty bins is non-constant (it grows with ^4), so we need 
to prove a sharp bound on the variance. Ganguly deals with small Lq via a separate subroutine, 
which itself requires 0(log(l/e)) update time, and uses space suboptimal by a log(mM) factor. 

Now we discuss update-only streams. By convention, Lq in the update-only case is typically 
referred to as Fq. As in our Lq algorithm, we use a balls-and-bins approach, though with a major 
difference. Our key to saving space is that all log(e^iV) levels share the same bins, and each bin only 
records the deepest level j in which it was hit. Thus, we can maintain all bins in the algorithm using 
0(e~^ loglog(e^7V)) space as opposed to 0(e~^ log(e^A^)). An obvious obstacle in our algorithm is 
that when counting the number of bins hit at level j, our count is obscured by bins that were hit 
both at level j and at some deeper level. Since each bin only keeps track of the deepest level it 
was hit in, we lose information about shallow levels. Our analysis then leads us to a more general 
random process, where there are A "good" balls and B "bad" balls, and we want to understand the 
number of "good bins", i.e. bins hit by at least one good ball and no bad balls. We show that the 
truly random process is well-approximated even when all balls are thrown /c-wise independently. 
The good balls are the distinct items at level j, and the bad ones are those at deeper levels. As long 
as = 0(l/e^), we have both that (1) A/B = 1 it 0{e) with good probability (by Chebyshev's 
inequality), and (2) A = (1 it 0(e))i<o/2'" (also by Chebyshev's inequality). Item (1) allows us to 
approximate the expected number of good bins as a function of just A, then invert to get A. Item 
(2) allows us to scale our estimate for A to recover an estimate for Fq. Our scheme is different from 
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[6], which did not subsample the universe, and based its estimator on the fraction of hash functions 
in a /c-wise independent family which map at least one ball to bin 1 (out of R bins). To estimate 
this fraction well, [6] required 0(l/e^) update time. Our update time, however, is constant. 

1.1.4 Other results: embedding into a normed space and an improved PRG 

Dimensionality reduction is a useful technique for mapping a set of high-dimensional points to 
a set of low-dimensional points with similar distance properties. This technique has numerous 
applications in theoretical computer science, especially the Johnson-Lindenstrauss embedding [31] 
for the L2 norm. Viewing the underlying vector of the data stream as a point in n-dimensional 
space, given two points a, 6 S [M]" in two different streams, one can view our sketches Sa, as a 
type of dimensionality reduction, so that \\a — b\\p can be estimated from the sketches Sa and Sb. 
Unfortunately, our sketches (as well as previous sketches for estimating Lp), are not in a normed 
space, and this could restrict the applications of it as a dimensionality reduction technique. This 
is because there are many algorithms, such as nearest-neighbor algorithms, designed for normed 
spaces. Indyk |25j overcomes this for the important case of L2 by doing the following. His streaming 
algorithm maintains Ta, where a is the vector in the stream, and T is an implicitly defined sketching 
matrix whose entries are pseudorandomly generated normal random variables. From Ta and Tb, 
\\Ta — Tb\\2 gives a (1 it e)-approximation to \\a — b\\2, and this gives an embedding into a normed 
space. The space is 0(e~^ log(nM/(e5)) log(n/(e5)) log(l/(5)) bits, where 6 is the desired failure 
probability. 

We reduce the space complexity of this scheme by a log(l/e) factor by replacing the use of 
Nisan's PRG |40j in Indyk's algorithm with an improved version of Armoni's PRG [3]. When writing 
his original PRG construction, time- and space-efficient optimal extractors were not known, so his 
PRG would only improve Indyk's use of Nisan's PRG when e was sufficiently small. We show that a 
recent optimal extractor construction of Guruswami, Umans, and Vadhan [23] can be modified to be 
computable in linear space and thus fed into Armoni's construction to improve his PRG. Specifically, 
the improved Armoni PRG stretches a seed of 0((S'/(log(5') — loglog(i?) + 0(1))) log R) bits to R 
bits fooling space-5 algorithms for any , improving the 0(5" log i?) seed length of Nisan's 

PRG. As many existing streaming algorithms rely on Nisan's PRG, using this PRG instead reduces 
the space complexity of these algorithms. 

Much of the reason the GUV extractor implementation described in [23] does not use linear 
space is its reliance on Shoup's algorithm [U] for finding irreducible polynomials over small finite 
fields, and in fact most of the implementation modifications we make are so that the GUV extractor 
can avoid all calls to Shoup's algorithm. 

1.2 Other Previous Work 

Here we discuss other previous work not mentioned above. Lg-estimation in the update-only model 
was first considered by Flajolet and Martin [18], who assumed the existence of hash functions 
with properties that are unknown to exist to obtain a constant-factor approximation. The ideal 
hash function assumption was later removed in [2]. Bar-Yossef et al. [6] provide the best previous 
algorithms, described above in Section ri.1.31 Estan, Varghese, and Fisk [16] give an algorithm which 
assumes a random oracle and a 0(l)-approximation to Lq, and seems to achieve 0(e~^ log A^) space 
with 0(log N) update time, though a formal analysis is not given. There is a previous algorithm 
for Lo-estimation in the turnstile model due to Cormode et al. [14] which needs to store 0(e~^) 
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1. Maintain Aj = J2"=i for j ^ Hi ^ = ©(l/^^)- Each Xij is distributed according to Vp. For 
fixed j, the Xij are fc-wise independent with k = 9(log(l/e)/ logloglog(l/£)). For j ^ j', the seeds 
used to generate the {Xi.jjf^j^ and {^ij'}f=i are pairwise independent. 

2. Let ^ = mcdian{|^j |}J^i. Output ^ • In (^i X;i=i cos 



Figure 2: Lp estimation algorithm pseudocode, < p < 2 

random variables from a p-stable distribution for p = 0(e/ log(mM)) and has 0(e~^) update time, 
though the precision needed to hold p-stable samples for such small p is Q{e~^ log A^), making their 
overall space dependence on 1/e cubic. Work of Cormode and Ganguly [21] implies an algorithm 
with 0(e~^ log^ N log^ {mM)) space and 0(log^ A^log(mM)) worst-case update time in the turnstile 
model. 

1.3 Notation 

For integer z > 0, [z] denotes the set {!,..., z}. For our upper bounds we let [U] denote the 
universe. That is, upon receiving an update {i, v) in the stream, we assume i € [U]. We can assume 
U = min{n, 0(m^)} with at most an additive 0(log log n) in all our Lp space upper bounds. Though 
this is somewhat standard, achieving an additive O(loglogn) as opposed to O(logn) is perhaps 
less well-known, so we include justification in Section [A.li All our space upper and lower bounds 
are measured in bits. 

We also use Isb(x) to denote the least significant bit of an integer x when written in binary. We 
note when x fits in a machine word, lsb(2;) can be computed in 0(1) time [HI I19j . 

2 Lp Estimation {0 < p < 2) 

Here we describe our space-optimal Lp estimation algorithm mentioned in Section Fl.l. 11 as well as 
the approach mentioned in Section [1.1.41 of using an improved PRC 

2.1 An Optimal Algorithm 

We assume p is a fixed constant. Some constants in our asymptotic notation are functions of p. We 
also assume ||a||p > 0; ||a||p = is detected when ^ = in Figure[2j Finally, we assume e > l/-y/m. 
Otherwise, the trivial solution of keeping the entire stream in memory requires 0{mlog{UM)) = 
0(e~^ log(A^M)) = 0(e~^ log(mM)) space. The main theorem of this section is the following. 

Theorem 2.1. Let < p < 2 be a fixed real constant. The algorithm of Figure [2] uses space 
0(e~^ log(mM)) and outputs (1 ±e)||a||p with probability at least 2/3. 

To understand the first step of Figure [21 we recall the definition of a p-stable distribution. 

Definition 2.2 (Zolotarev [51]). For < p < 2, there exists a probability distribution Dp called 
the p-stable distribution with E[e**"''"] = e"'*''' for X ~ Dp. For any integer n > and vector a E M"", 
if Xi, . . . , Xn ~ Dp are independent, then X^iLi ^-i^i ~ I l^^l \pT^p- 

To prove Lemma 12. 4| which is at the heart of the correctness of our algorithm, we use the 
following lemma. 
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Lemma 2.3 (Nolan |4H Theorem 1.12]). For fixed < p < 2, the probabihty density function of 
the p-stable distribution is 0(|x|~^~^). 

Now we prove our main technical lemma. 

Lemma 2.4. Let n be a positive integer and < e < 1. Let f{z) be a function holomorphic 
on the complex plane with |/(z)| = e'^^^~^^^^^^^\ where "^{z) denotes the imaginary part of z. Let 
k = log(l/e)/logloglog(l/e). Let ai,...,an be real numbers with \\a\\p = — 
Let Xi be a SC/c-independent family of p-stable random variables for C a suitably large even 
constant. Let 1^ be a fully independent family of p-stable random variables. Let X = - aiXi and 
y = aiYi. Then E[f{X)\ = E[f{Y)] + 0(e). 

Proof. The basic idea of the proof will be to show that the expectation can be computed to within 
0{e) just by knowing that the Xi's are ^-independent. Our main idea is to approximate / by a 
Taylor series and use the fact that we know the moments of the Xj. The problem is that the tails 
of the variables Xi are too wide, and hence the moments are not defined. In order to solve this we 
will need to truncate some of them in order to get finite moments. 

First, we use Cauchy's integral formula to bound the high-order derivatives of /. 

Lemma 2.5. Let /(^) denote the ith derivative of /. Then, = e'^(^) on R. 

Proof. For x £ R, let C be the circle of radius i centered at x in the complex plane. By Cauchy's 
integral formula. 



2-Ki 



X 



dz 



< — 



27r 



< 



< 



2^ JO 



oO(l+K-sin(t)|) 



27r 



■dt 



-ee'^dt 



27r 

=0W 





2n 



ailt 



dt 



Now, define the random variable 



Let 



and let 



Ui = l- Bi 



X'i — BiXi 
Lastly, define the random variable 



if \aiXi\ > 1 

1 otherwise 

1 if \aiXi\ > 1 
otherwise 

_ Jo if \aiXi\ > 1 
1 Xi otherwise 
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We note a couple of properties of these. In particular 



HUi] = o\ I x-i' 



We would also like to bound the moments of X[. In particular we note that E[(ajXj')^] is 1 for 
^ = 0, by symmetry is when ^ is odd, and otherwise is 



O 



(aix)^x-P"M = O (|a/|ar'+^) = O {\ai\'P) (2.1) 



where the implied constant above can be chosen to hold independently of £ (in fact we can pick a 
better constant if £ is large) . 

We will approximate E[/(X)] as 



E 



E [wA iwu] f ha.x.+Y.-^xi 

s,T \ \i€S / VieT / \ies i^s 



(2.2) 



where the outer sum is over pairs of subsets 5", T C [n], with |S|, |T| < Ck, and S and T disjoint. 
Call the function inside the expectation in Eq. (12.21) F (x). We would like to bound the error in 



approximating f{X) hy F [X j. Fix values of the Xi, and let O be the set of i so that Ui = 1. We 
note that 



F 



p)= E E (-1)1^1/ fE«^^^+E«^^^'V 

SCO Tco\s \ies i^s / 



\S\<Ck \T\<Ck 



Notice that other than the (— l)'"^' term, the expression inside the sum does not depend on T. This 
means that if < lOVS*! < Ck then the inner sum is 0, since 0\S will have exactly as many even 
subsets as odd ones. Hence if \0\ < Ck, we have that 

^P) = E/(E«^^* + E«*^n =/(E«^^*+E«^^n =/w- 

S=0 \i<^S i<^S / \ieO ii^O / 

Otherwise, after fixing O and S, we can sum over possible values of t = |r| and obtain: 

E (-i)'^' = E(-i)f'')^')- 



TCO\S t=0 
\T\<Ck 



In order to bound this we use the following Lemma: 

Lemma 2.6. For integers ^>S + l>Owe have that EiLo(-l)*(f) ^nd E£o^(-l)*(f) ^ave 
different signs, with the latter sum being if ^ = -B + 1. 
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Proof. First suppose that B < A/2. We note that since the terms in each sum are increasing 
in i, each sum has the same sign as its last term, proving our result in this case. For B > 
A/2 we note that EiLo(-l)*(f) = 0, and hence letting j = A - i, we can replace the sums by 
J2f^Q^~\-iy {j) and (-1)^+1 E/=o^"^(-lP(f), reducing to the case of = A-B-1 < 
A/2. U 

Using LemmalMl we note that ES(-1)* (''^)'^') and T>f=o^i-lY{^°Y) have different signs. 
Therefore we have that 

Ck 



i=0 



< 



(\0\S\ 
\Ck + l 



D-\S\ 
Ck + 1 



Recalling that |/| is bounded, we are now ready to bound F (^Xj — f{X) 
this is 0, and otherwise we have that 



Recah that if £»< Ck, 



Fix -f{X) 



<0 



1+ E 

, SCO 

\ \S\<Ck 



(D-\S\ 
\Ck + l 



Ck 



\s=0 
/ Ck 



D\ f D- s 
Ck+l 

D 

Ck+s+l 



\s=0 
/ Ck 

\s=0 



Ck+s+l 
s 



D 

Ck+s+l 



Therefore we can bound the error as 



Ck 



E 



F X 



E[/W] =0{Y.2 



Ck+s+l 



E 



D 

Ck+s+l 



We note that 



D 

Ck+s+l 



IC[n] iei 
\I\=Ck+s+l 



Hence by linearity of expectation and 2Ck + 1-independence, 



E 



D 

Ck+s+l 



7C[n] 
\I\=Ck+s+l 



E Iloi\a,n 



IC[n] i&I 
\I\=Ck+s+l 

E fni 



,.\P ^0{Ck) 



IC[n] 

1/1=0^ + 3 + 1 



Kiel 
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We note that when this sum is multipUed by {Ck + s + 1)!, these terms all show up in the expansion 
of (1 1^1 Ip)'^'^''''^'''^- fact, more generally for any integer < t < n 



Hence 



E 



D 

Ck + s+1 



/C[n] i&I 
\I\=t 

pO{Ck) 



dp 



t\ 



(2.3) 



Therefore we have that 



E 



F X 



{Ck + s + l)\ 



Ck 



(Ck) 



-Ck~ 



E[f{X)] =0 J]eO(^^)(Cfc)- 



= exp(-Cfc log A: + 0(CA;)) 

Clog(l/£)loglog(l/£) 
logloglog(l/e) 



Hence it suffices to approximate E 
Let 



exp 
0{e). 
Fix]]. 



+ 0{k) 



F X 



S,TC[n] 
\S\,\T\<Ck 

srrr=% 



where 



Fs^T X 




We will attempt to compute the conditional expectation of Fs^t \ Xj ' conditioned on the values of 
Xi for i € SUT. It should be noted that the independence on the Xi 's is sufficient that the values 
of the Xi for i £ SDT are completely independent of one another, and that even having fixed these 
values, the other X^ are still CA;-independent. 

We begin by making some definitions. Let R = [n]\{S U T). Having fixed S, T, and the values 
of for i G 5 U T, we let c = Ylit^s ^i-^i ^^'^ = Sigij ^^i^'i- We note that unless Ui = 1 for 

all i G 5 U T, that Fst ( X] = 0) and otherwise that 



Fs,T p) =f{c + X'). 



This is because if f/j = 1 for some i £ T, then X'^ = 0. Let Pc{x) be the Taylor series for /(c + x) 
about X = truncated so that its highest degree term is degree Ck — 1. We will attempt to 
approximate E[/(c + X')] hy pc{X'). By Taylor's theorem, Lemma [2.51 and the fact that C is even. 



\Pcix) - f{c + x)\ < 



|^|CfcgO(Cfe) 

(cky. 



^Ck^OiCk) 

(Cky. 



(2.4) 
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We note that £[^£(-^^0] is determined simply by the independence properties of the Xi since it is 
a low-degree polynomial in functions of the Xi . 

We now attempt to bound the error in approximating f{x + c) by Pc{x). In order to do so we will 

wish to bound B[{X')^'']. Let £ = Ck. We have that E[(X')^] = E [(Eieij ^i^Dl • Expanding this 

out and using linearity of expectation, yields a sum of terms of the form E [HiGijC^i^'^i )^'] > fo'^ some 
non-negative integers £j summing to L Let L be the set of i so that > 0. Since \L\ < i which is at 
most the degree of independence, Eq. (j2.ip implies that the above expectation is (HiGL l'**!^) e'^^'^'^. 
Notice that the sum of the coefficients in front of such terms with a given L is at most \L\^. This is 
because for each term in the product, we need to select an i G L. Eq. ()2.3p implies that summing 

riieL 1^*1^ subsets L of size, s, gives at most 

that: 



Mi 



Putting everything together we find 



E 



{X 



< 



l>pO{s) 



s e 



si 



^exp (£log(s) — slogs + 0(s)) 



s=l " s=l 
The summand (ignoring the 0{s)) is maximized when 



log(s) -|- 1. This happens when s 



O 



e 

log£ 



. Since the sum is at most £ times the biggest term, we get that 



E 



Therefore we have that 

\E[f{c + X')]-E\p,{X' 



{X'Y < exp {i log(£) - £ log log{£) + 0{£)) 
{X' 



< E 



£1 



< exp {£log{£) - t 
= exp (— £loglog( 



loglog(£)-£log(£) + 0(£)) 

+ 



exp 



-C log(l/£) log log log(l/e) 
log log log(l/e) 
exp(-(C + o(l))log(l/e)) =0(e 



+ o(log(e)) 



So to summarize: 



E[/(X)] = E 



F X 



+ Oie). 



Now, 



E 



F X 



E 



E 



S,TC[n] 
|S|,|T|<CA: 
SnT=0 



E 



Fs,T X 



\\T\ 



S,TC[n] 
|S'|,|T|<Cfc 
SnT=0 



n 

{xi}ieSUT \i(zSUT . 



S,TC[n 
|S'|,|T|<Cfc 
5nT=0 



{xi}ieSuT 




E[f{c + X')]dX,ix, 



E[p,{X')]+0{e))dXi{x^). 
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We recall that the term involving £[^£(-^^0] entirely determined by the 3Cfc- independence of the 
Xj's. We are left with an error of magnitude 



0(e) 



<0(e) 



<0(e) • 



5,TC[n 
\S\,\T\<Ck 
\ 5nT=0 

/ 



5,TC[n] 
\S\,\T\<Ck 
, 5nT=0 



n V- 




E n 

5,TC[n] VieSUT 
\S\,\T\<Ck 
5nT=0 



Letting s = jSI + |T|, we change this into a sum over s. We use Eq. (j2.3p to deal with the product. 
We also note that given 5 U T, there are at most 2** ways to pick S and T. Putting this together 
we determine that the above is at most 



^2Ck 



• E 



/y^ 0(1) 



\s=0 



=0(6). 



Hence the value of E[/(X)] is determined up to 0(e). 



The following is a corollary of Lemma 12.41 which is more readily applicable. 

Corollary 2.7. Let n be a positive integer and < e < 1. Let f{z) be a holomorphic function on 
C so that \f{z)\ = e'^(-^+l^(^)l\ Let k = clog(l/e)/logloglog(l/e) for a sufficiently large constant 
c > 0. Let Oj be real numbers for 1 < i < n. Let C > be a real number so that ||a||p = 0(C). 
Let Xi be a /c-wise independent family of p-stable random variables, Z he a single p-stable random 
variable, and X = J^iUiXi. Then, B[f{X/C)] = B[f{\\a\\pZ/C)] + 0{e). 

Proof. Apply Lemma 12.41 with the vector whose entries are Oj/C so that ||a||p = 0(1) and 
Y = X]j(a,i/0)li has the same distribution as HaHpZ/C ■ 

We now show the implications of Corollary 12.71 

Lemma 2.8. Let be real numbers for 1 < i < n. Let k and r be a suitably large constants, and 
let Xij a 2- wise independent family of /c-wise independent p-stable random variables (1 < i < n, 
1 < J < ?")• Then the median value across all j of | aiXij\ is within a constant multiple of ||a||p 
with probability tending to 1 as A; and r tend to infinity, independent of n. 

Proof. We apply Corollary 12.71 to a suitable function / which 
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1. is strictly positive for all x G R, 



2. is an even function, and 

3. decreases strictly monotonically to as x tends away from 0. 
We note 

/(.) = - r 2!!^Md. 

J~oo r 

satisfies these properties. A thorough explanation of why / satisfies the desired properties is in 
Section [A.2[ Henceforth, for < z < /(O), denotes the (unique) nonnegative inverse of z. 

We consider for constants C = B(l) the random variable 



A, 



c 




By Corollary [221 if Z is a p-stable variable, then ^[Ac] = E[/(CZ)] + 0(1), where the 0(1) term 
can be made arbitrarily small by choosing k sufficiently large. Furthermore since / is bounded and 
the terms in the sum over j defining Ac are 2-wise independent, Var(^c') = 0(l/r). Thus by 
Chebyshev's inequality, for k,r sufficiently large, Ac is within any desired constant of E[/(CZ)] 
with probability arbitrarily close to 1. 

We apply the above for a C > large enough that E[/(CZ)] < /(0)/3, and C" > smah 
enough that E[/(C"Z)] > 2/(0)/3. By picking r sufficiently large, then with any desired constant 
probability we can ensure Ac < x < /(0)/2 < y < Ac, for some constants x > /(0)/3 and 
y < 2/(0)/3 of our choosing — to be concrete, pick x = 4/(0)/9 and y = 5/(0)/9. In order for this 
to hold it must be the case that for at least half of the j's that 

^ <2x< /(O). 

This bounds the median of aiXi\ from below by 

r r^(8/(0)/9) V,„„ / 2 
V C y'"""P>V5C. 
Similarly, it must also be the case that for at least half of the j's 




>2(y-/(0)/2) >0. 



This bounds the median of aiXi\ from above by 



)IHIp<U 



The bounds on / ^(8/(0)/9) and / ^(/(0)/9) were verified by computer. Comments on computing 
C, C are in Section |X1 ■ 
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Lemma 2.9. Given e > 0, k as in Corollary 12.71 r a suitably large multiple of C = G(||a||p), 
and Xi_j (1 < i < n, 1 < J < r) a 2-independent family of /c-independent families of p-stable random 
variables then with probability that can be made arbitrarily close to 1 (by increasing r), it holds 
that 




< e. 

Proof. The Fourier transform of the probability density function q{x) of Dp is q{S,) = e"'^'' 
c ' 



Letting B = J^rjM, the expectation of cos(-BZ) for Z ~ D is 



2 2 

By CoroUarv 12.71 if A; is sufficiently large, the expected value of (^^- cos((^j aiXj)/C))/r is within 
e/2 of e~''ll"llp/'^'**'. Noting that each term in the sum is bounded by 1, and that they form a 
2-independent family of random variables, we have that the variance of our estimator is upper 
bounded by 1/r. Hence by Chebyshev's inequality, if r is chosen to by a suitably large multiple of 
e~^, then with the desired probability our estimator is within e/2 of its expected value. ■ 

Now we prove our main theorem. 
Proof (of Theorem l2.1|) . In Figure[2l as long as k, r are chosen to be larger than some constant, A is 
a constant factor approximation to ||a||p by Lemma 12.81 with probability at least 7/8. Conditioned 
on this, consider C = (X^j cos(^j/74))/r. By Lemma [2.9|. with probability at least 7/8, C is 
within 0(e) of e-^W^W^/^'^" fr om which a (1 + 0(e))-approximation of \\a\\p can be computed as 
A ■ (— ln(C))^/^. Note that our approximation is in fact a (1 + 0(e))-approximation since the 
function /(x) = e"'^''' is bounded both from above and below by constants for x in a constant- 
sized interval (in our case, x is ||a||p/A), and thus an additive 0(e)-approximation to e"'^'^ is also 
a multiplicative (1 -|- 0(e))-approximation. 

There are though still two basic problems with the algorithm of Figure [2j The first is that we 
cannot store the values of Xj to unlimited precision, and will at some point have rounding errors. 
The second problem is that we can only produce families of random variables with finite entropy 
and hence cannot keep track of a family of continuous random variables. 

We deal with the precision problem first. We will pick some number 5 = G(em"^). We round 
each Xij to the nearest multiple of 5. This means that we only need to store the Xj to a precision 
of 5. This does produce an error in the value of Xj of size at most ||a||i5 < |i : 7^ 0| max(|aj|)(5 < 
m||a||pJ = G(e||o||p). This means that C is going to be off by a factor of at most 0(e), and hence 
still probably within a constant multiple of ||a||p. Hence the values of Xj/C will be off by 0(e), so 
the values of A and our approximation for ||a||p will be off by an additional factor of 0(e). 

Next we need to determine how to compute these continuous distributions. It was shown by 
[12j that ap-stable random variable can be generated by taking 9 uniform in [— 7r/2, 7r/2], r uniform 
in [0, 1] and letting 

sin(pg) / cos(g(l-p)) y^"^)/^ 

cosVp(0)"V log(l/r) ) 

We would like to know how much of an error is introduced by using values of r and 9 only accurate 
to within 5' . This error is at most 5' times the derivative of /. This derivative is not large except 
when 9 or (1 —p)9 is close to =b7r/2, or when r is close to or 1. Since we only ever need mr different 
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values of Xij, we can assume that with reasonable probability we never get an r or closer to these 
values than 0{m~^e^). In such a case the derivative will be bounded by {m£~^)'^^^\ Therefore, if 
we choose r and 6 with a precision we can get the value of X with introducing an 

error of only 6. 

Lastly, we need to consider memory requirements. Our family must be a 2-independent family 
containing 0(e~^) /c-independent families of U random variables. Each random variable requires 
0(log(me~^)) bits. The amount of space needed to pick out an element of this family is only 
0(/c(log(C/) +log(me~-'^))) = 0(A; log(m/e)) = 0(A;logm) (recall we can assume log{U) = 0{logN), 
and e > l/y/m). More important is the information needed to store the Xj. We need to store 
them to a precision of 6. Since there are only mr values of Xij, with reasonable probability, none 
of them is bigger than a polynomial in mr. If this is the case, the maximum value of any Xj is at 
most (mMe~^)'^(^). Hence each Xj can be stored in 0(log(mMe~^)) = 0(log(mM)) space, thus 
making the total space requirements 0(e~^ log(mM)). ■ 

2.2 Derandomizing Lp Estimation via Armoni's PRG 

Indyk [25], and later Li p2], gave algorithms for Lp estimation which are also based on p-stable 
distributions. Their algorithms differ from ours in Figure [2] in two ways. First, both Indyk and Li 
made the variables Xij in Step 1 truly random as opposed to having limited independence. Second, 
the estimator they use in Step 2 differs. Indyk uses a median estimator on the \Aj\, and Li has two 
estimators: one based on the geometric mean, and one on the harmonic mean. The change in Step 
1 at first seems to make the algorithms of Indyk and Li not implementable in small space, since 
there are n/e^ random variables Xij to be stored. Indyk though observed that his algorithm could 
be derandomized by using a PRG against small-space computation, and invoked Nisan's PRG to 
derandomize his algorithm. Doing so multiplied his space complexity by a log(A^/e) factor. Li then 
similarly used Nisan's PRG to derandomize his algorithm. 

Nisan's PRG [50] stretches a seed of 0{S log R) random bits to R "pseudorandom" bits fooling 
any space-S* algorithm with one-way access to its randomness. We show that a PRG construction 
of Armoni [3] can be combined with a more space-efficient implementation of a recent extractor 
of Guruswami, Umans, and Vadhan (GUV) [23] to produce a PRG whose seed length is only 
0((5/(log5 — log log ii -|- 0(l)))logi?) for any R = 2'-'^^\ Due to the weaknesses of extractor 
constructions at the time, Armoni's original PRG only worked when R < 2'^ for constant 5 > 0. 
In the cases of Indyk and Li, 5 = 0{e~'^ log{mM)) and R = poly(A^)/e^. The key here is that 
although N can be exponentially large in log(mM), the dependence on e in both S and R are 
polynomially related. The result is that using the improved Armoni PRG provides a more efficient 
derandomization than Nisan's PRG by a log(l/e) factor, giving the following. 

Theorem 2.10. The Lp-estimation algorithms of Indyk and Li can be implemented in space 
0(e-2 log(mM) log(Af/e)/ log(l/e)). 

Most of our changes to the implementation of the GUV extractor are parameter changes which 
guarantee that we always work over a field for which a highly explicit family of irreducible polyno- 
mials is known. For example, we change the parameters of an expander construction of GUV based 
on Parvaresh-Vardy codes [42] which feeds into their extractor construction. Doing so allows us to 
replace calls to Shoup's algorithm for finding irreducibles over F2[x], which uses superlinear space, 
with using two explicit families of irreducibles over F7[x] with a few properties. One property we 
need is that if we define extension fields using polynomials from one family, then the polynomials 
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from the other family remain irreducible over these extension fields. Full details are in Section [A.SI 



3 Lower Bounds 

In this section we prove our lower bounds for (1 ibe)-multiplicative approximation of Fp for any real 
constant p > when deletions are allowed. When p > 0, we prove a Q{e~'^ log(e^A^)) lower bound. 
Whenp is a constant strictly greater than 0, the lower bound improves to il(min{A^, e~'^(log(e^??T,M))}). 
All our lower bounds assume e > I/^/N. We also point out that O (log log (nmM)) is a folklore 
lower bound for all problems we consider in the strict turnstile model by a direct reduction from 
Equality. In the update-only model, there is a folklore ri(loglogn) lower bound. Both lower 
bounds assume m > 2. Our lower bounds hold for all ranges of the parameters e, n, m, M varying 
independently. 

Our proof in part uses the fact that Augmented-Indexing requires a linear amount of com- 
munication in the one-way, one-round model [U [36] . We also use a known reduction |30^ [50] from 
indexing to Gap-Hamdist. Henceforth all communication games discussed will be one-round and 
two-player, with the first player to speak named "Alice" , and the second "Bob" . We assume that 
Alice and Bob have access to public randomness. 

Definition 3.1. In the Augmented-Indexing problem, Alice receives a vector x G {0, 1}", Bob 
receives some i £ [n] as well as all xj for j > i, and Bob must output Xi. The problem Indexing 
is defined similarly, except Bob receives only i G [n], without receiving Xj for j > i. 

Definition 3.2. In the Gap-Hamdist problem, Alice receives x G {0, 1}" and Bob receives 
y G {0, 1}'". Bob is promised that either A(x,y) < njl — ^fn (NO instance), or A(x,y) > n/2 + -y/n 
(YES instance) and must decide which case holds. Here A(-, •) denotes the Hamming distance. 

The following two theorems are due to [H [36] and [30l [50] . 

Theorem 3.3 (Miltersen et al. [36]! Bar-Yossef et al. [1]). The randomized one-round, one-way 
communication complexity of solving AuGMENTED-lNDEXiNG with probability at least 2/3 is 0(n). 
Furthermore, this lower bound holds even if Alice's and Bob's inputs are each chosen independently, 
uniformly at random. The lower bound also still holds if Bob only receives a subset of the Xj for 
2>i. ■ 

Theorem 3.4 (Jayram et al. [30], Woodruff [50l Section 4.3]). There is a reduction from Indexing 
to Gap-Hamdist such that the uniform (i.e. hard) distribution over Indexing instances is mapped 
to a distribution over Gap-Hamdist instances where each of Alice and Bob receive strings whose 
marginal distribution is uniform, and deciding Gap-Hamdist over this distribution with probability 
at least 11/12 implies a solution to Indexing with probability at least 2/3. Also, in this reduction 
the vector length n in Indexing is the same as the vector length in the reduced Gap-Hamdist 
instance to within a constant factor. ■ 

We now give our lower bounds. We use the following observation in the proof of Theorem 13. 6[ 

Observation 3.5. For two binary vectors u^v of equal length, let A(n, v) denote their Hamming 
distance. Then for any p > 0, (2^ - 2)A(ti,t;) = 2P||u||i 2^\\v\\x - 2\\u^v\\l. 

Theorem 3.6. For any real constant p > 0, any one-pass streaming algorithm for (1 it e)- 
multiplicative approximation of with probability at least 11/12 in the strict turnstile model 
requires ^i\p — lpe~^ log(e^A^/|p — 1^)) bits of space. 

Proof. Given an algorithm A providing a (1 it d\p — l|e)-multiplicative approximation of Fp with 
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probability at least 11/12, where d > is some constant to be fixed later, we devise a protocol to 
decide Augmented-Indexing on strings of length log(e^A^). 

Let Alice receive x G {0,1}^ (iog(^ ^))^ and Bob receive z € [e~^(log(e^A^))]. Alice divides x 
into log(e^A^) contiguous blocks where the ith block bi is of size 1/e^. Bob's index z lies in some 
and Bob receives bits xj that lie in a block bi with i > i{z). Alice applies the Gap-Hamdist 
reduction of Theorem 13.41 to each bi separately to obtain new vectors t/i each of length at most c/e^ 
for some constant c for all < i < log(e^A^). Alice then creates a stream from the set of yi by, for 
each i and each bit {yi)j of yi, imagining universe elements (i, j, 1), . . . , (i, j, 2*) and inserting them 
all into the stream if {yi)j = 1, and not inserting them otherwise. Alice processes this stream with 
A then sends the state of A to Bob along with the Hamming weight w{yi) of yi for all i. Note the 

size of the universe in the stream is at most ce~^ Y^i=l)' ^ 2* = 0{N) = 0{n). 

Now, since Bob knows the bits in bi for i > i{z) and shares randomness with Alice, he can 
run the same Gap-Hamdist reduction as Alice to obtain the yi for i > i[z) then delete all the 
insertions Alice made for these y^. Bob then performs his part of the reduction from Indexing on 
strings of length 1/e^ to Gap-Hamdist within the block to obtain a vector y{B) such that 
deciding whether /S.{y{B),yi(^z)) > e~^/2 -|- or A(y(i?), 2/4(2)) < e~'^ /2 + with probability 
at least 11/12 allows one to decide the Indexing instance with probability at least 2/3. Here 
A(-, •) denotes Hamming distance. For each j such that y{B)j = 1, Bob inserts universe elements 
{i{z),j, 1), . . . , {i{z),j, 2*(^)) into the stream being processed by A. We have so far described all 

stream updates, and thus the number of updates is at most 2ce~^ S!^o^ ^ 2* = 0{N) = 0{m). 
By Observation 13.51 with u = yn^z) a-iid v = y{B), the pth moment L" of the stream now exactly 
satisfies L" = 2*(-)((l-2P-i)A(y(i?),y,(,)) + 2P-i«;(y,(,)) + 2P-i7/;(y(i?))) + E,<i(,) w'(2/^)2^ Setting 
r] = w{yi)2^ and rearranging terms, 

2P~i 2P~^ 2~^^^\ri — L") 

MyiB),yi(z)) = 2p~i _ i ^iViiz)) + 2P-1 - + — 2P-1 - 1 — 

Recall that in this Gap-Hamdist instance. Bob must decide whether A(y(i?), y^i-^)) < l/2e^ — 1/e 
or A(y(i?), yj(2)) > l/2e^ -|- 1/e. Bob knows r/, w{yi(^z-^), and w{y{B)) exactly. To decide Gap- 
Hamdist it thus suffices to obtain a ((2^"-^ — l)/(4e))-additive approximation to 2~*(^)L". Since 
2~*(^)L" is upper-bounded in absolute value by (1 -|- 2^)/e^, our desired additive approximation 
is guaranteed by obtaining a (1 it ((2*'~^ — l)e/(4 • (1 -|- 2P))))-multiplicative approximation to 
L" . Since p 7^ 1 is a constant and |2^ — 1| = 0(|x|) as x — > 0, this is a (1 it 0{\p — l|e))- 
multiplicative approximation, which we can obtain from A by setting (i to be a sufficiently large 
constant. Recalling that A provides this (1 it 0{\p — l|e))-approximation with probability at least 
11/12, we solve Gap-Hamdist in the block i{z) with probability at least 11/12, and thus Indexing 
in block i{z) with probability at least 2/3 by Theorem 13. 4[ Note this is equivalent to solving the 
original Augmented-Indexing instance. 

The only bits communicated other than the state of A are the transmissions of w{yi) for < i < 
log(e^iV). S'mcew{yi) < 1/e^, all Hamming weights can be communicated in 0(log(l/e) log(e'^A^)) = 
o(e~^ log(e^7V)) bits. By the lower bound on Augmented-Indexing from Theorem 13. 3| we thus 
have that (1 it d\p — l|e)-approximation requires Q{e~'^ log{e'^ N)) bits of space for some constant 
d > 0. In other words, setting e' = d'\p — l\e we have that a (1 it e')-approximation requires 
n{\p - l|2e'-2 log(£'^N/\p - 1|2) bits of space. ■ 

When p is strictly positive, we can improve our lower bound by gaining a dependence on niM 
rather than A^, obtaining the following lower bound. 
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Theorem 3.7. For any real constant p > 0, any one-pass streaming algorithm for (1 it e)- 
multiplicative approximation of Fp with probability at least 11/12 in the strict turnstile model 
requires Q(min{iV, \p — lpe~^(log(e^mM/|p — Ip))}) bits of space. 

Proof. In the proof of Theorem 13.61 Alice divided her input x into log(e^A^) blocks each of equal 
size and used the ith block to create an instance of Gap-Hamdist. However, in order to have 
the weight of each block's contribution to the stream increase geometrically, Alice had to replicate 
each coordinate in the ith. block 2* times. Now, instead, round M to the nearest power of 2i/P 
and let Alice's input be a string x of length minjlogji/p M, e^A^}. Dividing her input into 
min{log2i/p M,e'^N} blocks, Alice does not replicate any coordinate in a block i but rather gives 
each coordinate frequency 2*/^. By choice of the number of blocks, no item's frequency will be 
larger than M, and the number of universe elements and the stream length will each be at most 
N. These frequencies /i,/2,... are chosen so that f[ = 2*. Similarly to Observation 13.51 for two 
vectors u,v of equal length where each coordinate is either t or (in Observation 13.51 the vectors 
were binary), for any p > we have t^(2^ — 2)A{u,v) = tP2P||n||i + t^2P||u||i — 2||m + v\\p where 
A{u, v) is the Hamming distance of n, v. 

Following the same steps as in Theorem 13.61 with the same notation, one arrives at 

2P~i 2P~^ (rj — L") 

since /^^^-j = 2^^^\ For deciding Gap-Hamdist in block i{z) it suffices to obtain an additive 

2j(2)(2P-i - l)/(4e)-additive approximation to L". Since L" < 2*(^)(2P + l)/e'^, the desired additive 
approximation can be obtained by a (1 it ((2^"^ — l)e/(4 • [2^ + l)))-multiplicative approximation, 
just as in Theorem 13. 6[ The rest of the proof is identical as in Theorem 13.61 

The above argument yields the lower bound i}{m.m{N,e^ log(M)). We can similarly obtain the 
lower bound r2(min{A^, log(e^m)) by, rather than updating an item in the stream by fi = 2*/^ in 
one update, we update the same item fi times by 1. The number of total updates in the ith block 
is then 2*/^/^^, and thus the maximum number of blocks we can give Alice to ensure that both the 
stream length and number of used universe elements is at most is minje^A, 0(log(e^m))}. ■ 

The decay of our lower bounds as p — > 1 is necessary in the strict turnstile model since Li gave an 
algorithm in this model whose dependence on e becomes subquadratic as p ^ 1 [33]. Furthermore, 
when p = 1 there is a 0(log(mM))-space deterministic algorithm for computing Fi: maintain a 
counter. In the turnstile model, for p > we give a lower bound matching Theorem 13.71 but without 
any decay as p — > 1. 

Theorem 3.8. For any real constant p > 0, any one-pass streaming algorithm for (1 it e)- 
multiplicative approximation of Fp in the turnstile model with probability at least 11/12 requires 
f](min{A, e~^(log(e^mM))}) bits of space. 

Proof. As in Theorem 13. 7^ Alice receives an input string x of length minjlog M, e'^N} as 
opposed to the string of length log(e^A^) in Theorem 13. 6[ Also, Alice carries out her part 
of the protocol just as in Theorem 13.71 However, for each j such that y{B)j = 1, rather than 
inserting a universe element with frequency 2*(^)/p, Bob deletes it with that frequency. Now we 
have L", the pth moment of the stream, exactly equals 2*^^^ A(y(i3), -|- w {yi)2'('K and 

thus A(y(i?), = 2~'^^^\rj — L"). As in Theorem 13. 6^ Bob knows rj exactly and thus only needs 
a (l/4e)-additive approximation to L"2~*(^) to decide the Gap-Hamdist instance (and thus the 
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original Augmented-Indexing instance), which he can obtain via a (1 it (e/8))-approximation to 
L" since L"2-*(^) <2/e^. ■ 

Our technique also improves the known lower bound for additively estimating the entropy of 
a stream in the strict turnstile model. The proof combines ideas of [10] with our technique of 
embedding geometrically-growing hard instances. By entropy of the stream, we mean the empirical 
probability distribution on [n] obtained by setting pi = aj/||a||i. 

Theorem 3.9. Any algorithm for e-additive approximation of H, the entropy of a stream, in the 
strict turnstile model with probability at least 11/12 requires space ri(e~^ log(A^)/log(l/e)). 

Proof. We reduce from Augmented-Indexing, as in Theorem 13.61 Alice receives a string of 
length s = log A^/(2e^ log(l/e)), and Bob receives an index z E [s\. Alice conceptually divides her 
input into b = e^s blocks, each of size 1/e^, and reduces each block using the Indexing^Gap- 
Hamdist reduction of Theorem 13.41 to obtain h Gap-Hamdist instances with strings yi,...,yb, 
each of length I = G(l/e^). For each 1 < ? < 6, and 1 < j < Alice inserts universe elements 
1, {yi)j), . . . , e~^*, {yi)j) into the stream and sends the state of a streaming algorithm to 

Bob. 

Bob identifies the block i{z) in which z lands and deletes all stream elements associated with 
blocks with index i > i{z). He then does his part in the Indexing— >Gap-Hamdist reduction 
to obtain a vector y(Bob) of length i. For all 1 < j < ^, he inserts the universe elements 
{i{z),j, 1, y(Bob)j), . . . , {i{z),j, e"^*^^) , y(Bob)j) into the stream. 

The number of stream tokens from block indices i < i{z) \s A = e~'^ X^i=o ^ — 
The number of tokens in block i{z) from Alice and Bob combined is 2e~^^*^^)^^\ Define B = 
and C = The Li weight of the stream is R = A + 2BC. Let A denote the Hamming distance 
between and y{Boh) and H denote the entropy of the stream. 

We have: 

.^^ 2B{C-A)^ fR\ 2BA^ 
= + ^ -^ogi- \ + ^log{R) 



^1 /m 2BC^ 2BC 2BA 

- log(i?) + log(i?) - + 



Rearranging terms gives 



To decide the Gap-Hamdist instance, we must decide whether A < l/2e^ — 1/e or A > l/2e^-|- 
1/e. By Eq. (j3.ip and the fact that Bob knows A, B, C, and R, it suffices to obtain a 1/e-additive 
approximation to HR/{2B) to accomplish this goal. In other words, we need a 2i?/(ei?)-additive 
approximation to H. Since B / R = Q{e^), it suffices to obtain an additive 0(e)-approximation to H. 
Let ^ be a streaming algorithm which can provide an additive 0(e)-approximation with probability 
at least 11/12. Recalling that correctly deciding the Gap-Hamdist instance with probability 11/12 
allows one to correctly decide the original Augmented-Indexing instance with probability 2/3 
by Theorem 13.41 and given Theorem 13.31 A must use at least \og{N)/{e^ log(l/e)) bits of space. As 
required, the length of the vector being updated in the stream is at most Yld=i = 0{N) = 0(n), 
and the length of the stream is exactly twice the vector length, and thus 0{N) = 0{m). ■ 
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4 Lq in turnstile streams 



We describe our algorithm for multiplicatively approximating Lq in the turnstile model using 
0(e~^ log(e^A^)(log(l/e) + loglog(mM))) space with 0(1) update and reporting time. Without 
loss of generality, we assume (1) is a power of 2, and (2) e > 1/(3 • N). We can assume (2) since 
otherwise one could compute Lq exactly since Lq < is an integer. In both this algorithm and 
our Fq algorithm, we make use of a few lemmas analyzing a balls-and-bins random process where 
A good balls and B bad balls are thrown into K bins with limited independence (in the case of our 
Lq algorithm, i? is 0). These lemmas we occasionally refer to are in Section [A.4[ 

4.1 A Promise Version 

We give an algorithm LogEstimator for estimating Lq when promised that Lq < l/(20e^) which 
works as follows. First, we assume that the universe size is 0{l/e^) since we can pairwise inde- 
pendently hash the universe down to [b/e'^] for some constant 6 > via some hash function h^. In 
doing so we can assure that the indices contributing to Lq are perfectly hashed with constant prob- 
ability arbitrarily close to 1 by choosing b large enough. Henceforth in this subsection we assume 
updates {i,v) have i £ [W] for U' = 0(l/e^). Let e' = ej max{200, /} for a constant / appearing 
in the analysis. We pick hash functions h\ : [[/'] — > [l/(e')^] from a ci log(l/e)/log log(l/e))-wise 
independent hash family and /12 : [l/(e')^] from a pairwise independent family. The value 

ci is a positive constant to be chosen later, and hi is chosen from a hash family of Siegel [45] to 
have constant evaluation time. The function hi should be thought of as the function that assigns 
the Lq items to their appropriate bins, while /12 is chosen as part of a technical solution to prevent 
two items with non-zero frequency that hash to the same bin from canceling each other out. 

We also choose a prime p randomly in [D, D'^] for D = log(mM)/e^. Notice that for mM larger 
than some constant, by standard results on the density of primes, there are at least log(mM) / (400e^) 
primes in the interval [D,D'^]. This implies non-zero frequencies remain non-zero modulo p with 

l/(e')^ 

good probability. Next, we randomly pick a vector u G ¥p 

We maintain l/(e')^ counters Ci, C2, . . . , Ci/(£/-)2 modulo p, each initialized to zero. Upon 
receiving an update {i,v), we do 

Chi{i) ^ (Chiii) + V ■ Uft,2(i)) mod p. 

Let I = {i : Ci ^ 0}. If |/| < 100, our estimate of Lq is |/|. Else, our estimate is Lq = 
ln(l-(e')^|I|)/ln(l-(e')^). 

Before we analyze our algorithm, we need a few lemmas and facts. 

Lemma 4.1. Let W be a family of c • log(l/e)/loglog(l/e)-wise independent hash functions 
h : [U] ^ [l/e'^] for a sufficiently large constant c > 0. Let S C [U] be an arbitrary subset of 
100 < -^0 !^ l/(20e^) distinct items. Suppose we choose a random h £ H. For i G [1/^^], let 
X'- be an indicator variable which is 1 if and only if there is an x G S* for which h{x) = i. Let 

X' = Ya=i X'i and let Y = ln(l - £'^X')/\n{l - e^). Then there is a constant / > so that 
Pr/j[|y-Lo| >e/Lo]<l/4. Moreover, for any x = (1 ± ce)^, | ln(l - e^a;)/ ln(l - e^) _ Lq] < e/Lo 
for a constant / = /(c), where /i = e~^(l — (1 — e^)^"). 

Proof. We first prove the second statement. Recall 100 < Lq < l/(20e^), implying e < 1/5. 
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Supposing |x — /i| < ce/i for some constant c > 0, we have 

ln(l - e^x) _ ln((l - e"^)^" ± Se^^) 
ln(l - £2) - ln(l - £2) 

ln((l - e2)io) 0(^3^^ 



ln(l - £2) ln(l - £2) 
Lo±0 



£'^^ 

= Lo±OM 
= (l±0(£))Lo 

The second equality holds since £ is bounded away from 1, implying y = {1 — £^)^° is bounded 
away from 0, so the derivative of In at y is bounded by a constant. The third equality similarly 
holds since 1 — £^ is bounded away from so that ln(l — £^) = Q{£^). The final equality holds since 
< Lq. The first part of the theorem follows since |X — ^| < 8£^ with probability at least 3/4 by 
Lemma lAiMl ■ 

Fact 4.2. Let Fg be a finite field and w G FJ^ be a non-zero vector. Then, picking a vector w at 
random in F^ gives Pr[w • w = 0] = l/g, where v ■ w \s the inner product over Fg. 

Proof. The set of vectors orthogonal to w is a linear subspace of Fg of dimension d — \ and thus 
has q*^'^ points. A random if £ Fg thus lands in this subspace with probability 1/q. ■ 

Fact 4.3. Let U ,t be positive integers. Pick a function h : \U] [t] from a pairwise independent 
family. Then for any set S C [U] of size s<t, E^^^^ (\h-H^ns\-^^ < s^/{2t). 

Proof. Assume S* = {1, . . . , s}. Let Xij indicate h{i) = j. By symmetry of the Xtj, the desired 
expectation is 

*j:E[XdE[X.,,l=t(;)^<| 

i<i' ^ ' 

■ 

To evaluate the hash function h\ in constant time, we use the following a theorem of Siegel, in 
a form that was stated more succinctly by Dietzfelbinger and Woelfel |15j . 

Theorem 4.4 (Siegel [IS]). Let < < 1 and > 1 with /ifc < 1 be given. Then if C < 1 
and d > 1 satisfy C ^ ^ + ^"'"^"f llgz^"^^ ' ^ (^^^^ large enough), then there is a way of randomly 
choosing a function h : [z^] — > [z] such that the following hold: (1) the description of h comprises 
0{z'') words in [z], (2) the function h can be evaluated by XOR-ing together d^^'^ /clogz-bit words, 
and (3) the class formed by all these /I's is z'^-wise independent. 

Finally, we need the following lemma to achieve 0(1) reporting time. 

Lemma 4.5. Let K = be a positive integer with e < 1/2. It is possible to construct a lookup 
table requiring 0(£~^ log(l/£)) bits such that ln(l — c/K) can then be computed with relative 
accuracy e in constant time for all integers c G [4i<r/5]. 

Proof. We set e' = e/15 and discretize the interval [1/5, 1 — £^] geometrically by powers of (1 -|-£'). 
We precompute the natural algorithm evaluated at all discretization points, with relative error £/3, 
taking space 0{e~^ log(l/£)). We answer a query ln(l — c/K) by outputting the natural logarithm 
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of the closest discretization point in the table. Our output is then, up to (1 it e/3), 

ln(l - (1 ± e')c/K) = ln(l - c/K ± e'c/K) = ln(l - c/K) ± be'c/K = ln(l - c/K) ± £c/{?,K). 

Using the fact that | ln(l - z)\ > z/(l - z) for < z < 1, we have that | ln(l - c/K)\ > c/{K - c) > 
c/K. Thus, 

(1 ± e/3)(ln(l - c/K) ± ec/K) = (1 ± e/3)(l ± e/3) ln(l - c/K) = {l±e) ln(l - c/K). 

■ 

Now we analyze LogEstimator. 

Theorem 4.6. Ignoring the space to store /13, LogEstimator uses space 0(e~^(log(l/e) + 
loglog(mM))). The update and reporting times are 0(1). If Lq < l/(20e^) then LogEstimator 
outputs a value Lq = (1 ± e)Lo with probability at least 3/5. 

Proof. The vector u takes 0{e~^ logp) = 0(e~^(log(l/e) + log log(mM))) bits to store. Each 
counter Ci takes space O(logp) and there are 0(l/e^) counters, thus also requiring 0(e~^(log(l/e) + 
loglog(mM))) total space. The hash function /i2 requires 0(log(l/e)) space. 

For the update time, for each stream token we must evaluate three hash functions. The hash 
functions /j2 5^3 each take constant time. For hi, we can use the hash family of Theorem 14.41 with 
z = l/e^,k = 2 + 0(1), /i = 1/8, C = 1/2, d = 9. We then have that hi is 1/e^/^-wise independent, 
which is ci log(l/e)/ log log(l/e)-wise independent for e smaller than some constant. Also, hi can 
be evaluated in constant time, and it requires 0(e~^ log(l/e)) bits of storage. This storage is 
dominated by the amount of storage required just to hold the counters Cj. We must also multiply 
by a coordinate of u fitting in a word, taking constant time. 

For the reporting time, we can precompute ln(l — (e')^) during preprocessing. To compute 
ln(l - (e')^l^l)> first note that we can maintain |/| in constant time during updates using an 
0(log(l/e))-bit counter. Also note that 

E[|/|l < (1 ± (1 - (1 - (.')^)-") S (Ij. (1 - (1 - ^)) S iSoOcW- 

Thus, by Markov's inequality, |/| < l/(4(e')^) with probability at least 99/100, and we can use 
a lookup table as in Lemma 14.51 compute the natural logarithm. The space required to store the 
lookup table is dominated by the space used in other parts of the algorithm. 

We now prove correctness. First, we handle the case 100 < Lq < l/(20e^). 

Let S be the set of Lq indices j G \U'] with Xj / at the end of the stream. 

Let Q be the event that p does not divide any \xj\. 

Let Q! be the event that /i2(i) 7^ ^2(j') for distinct indices G S with /ii(j) = hi[j'). 

Henceforth, we condition on both Q and Q! occurring, which we later show holds with good 
probability. Define / C [l/(e')^] by / = : /ii"^(i) n 5 / 0}, that is, / is the image of S under 
hi. For each i £ I, Ci can be viewed as maintaining the dot product of a non-zero vector v in F^o, 
the frequency vector x restricted to coordinates in S, with a random vector w, namely, the vector 
obtained by restricting u to coordinates in S. The vector v is non-zero since we condition on Q, 
and w is random since we condition on Q'. 

Let Q" be the event that no Cj is zero for i £ I. 

Conditioned on Q, Q', and Q" , we can apply Lemma l4.ll and since e' < e/f, our estimate Lq 
of Lq will satisfy \Lq — Lq\ < eLq with probability at least 3/4. 
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Now we analyze the probability that Q, Q', and Q" all occur. Each \xj\ is at most mM and thus 
has at most log(mM) prime factors. Thus, there are at most Lolog(mM) < log(mM)/(20e^) prime 
divisors that divide some \xj\, j G S. By our choice of p, we pick such a prime with probability at 
most 1/20, and thus Pr[Q] > 19/20. 

Now, let Xij be a random variable indicating that = /ii(j') for distinct £ S. Let 

X = Ylj<j> By FactiHwith U = U',t = l/{e'f > l/e^ and s = Lq < l/{20e^), we have that 

B[X] < l/(800e2). Let J = {{jj') G (f) : hi{j) = /ii(j')}- For G J let Yjj, be a random 

variable indicating /i2(j) = ^2(/)) and let y = "^(^j j')^jYj.j'- Then by pairwise independence 
of /i2, E[y] = E(j,j')ejP^[^2(j) = /i2(j')] = l-^Ke')^ < \J\^^- Note |J| = X. Conditioned on 
X < 20E[X] < l/(40e^), which happens with probability at least 19/20 by Markov's inequality, 
we have that B[Y] < \J\e^ < 1/40, so that Pr[Y > 1] < 1/40. Thus, Q' holds with probability at 
least (19/20) • (39/40) > 7/8. 

Finally, by Fact 14.21 with q = p, and union bounding over all 1/e^ counters Cj, Q" holds with 
probability at least 1 - l/{e'^p) > 99/100. Thus, Pr[Q A Q' A Q"] = Pr[Q A Q']Pr[Q"|Q A Q'] > 
(19/20) • (7/8) • (99/100) > 4/5 (notice that Q and Q' are independent). The algorithm thus 
succeeds with probability at least (4/5) • (3/4) = 3/5 in this case. 

Now we consider the case Lq < 100. If the elements of S are perfectly hashed and Q holds, we 
output Lq exactly. By choice of e', l/(e'^) > (200)^. Thus, all elements of S are perfectly hashed 
with probability at least 7/8 by pairwise independence of hi. We already saw that Pr[Q] > 19/20, 
so we output Lq exactly with probability > (7/8) • (19/20) > 3/5. ■ 

4.2 A Rough Estimator 

For our full algorithm to function, we need to run in parallel a subroutine giving a constant- 
factor approximation to Lq. We describe here a subroutine RoughEstimator which does exactly 
this. First, we need the following lemma which states that when Lq is at most some constant 
c, it can be computed exactly in small space. The lemma follows by picking a random prime 
p = 0(log(mM) log log(?7iAf )) and pairwise independently hashing the universe into [0(c^)] buckets. 
Each bucket is a counter which tracks of the sum of frequencies modulo p of updates to universe 
items landing in that bucket. The estimate of Lq is then the total number of non-zero counters, 
and the maximum estimate after 0(log(l/r/)) trials is finally output. This gives the following. 

Lemma 4.7. There is an algorithm which, when given the promise that Lq < c, outputs Lq 
exactly with probability at least 1 — rj using 0(c^ log log(mM)) space, in addition to needing to 
store 0(log(l/ry)) independently chosen pairwise independent hash functions mapping [U] onto [c^]. 
The update and reporting times are 0(1). 

Now we describe RoughEstimator. We pick a function h : [U] ^ [N] at random from a 
pairwise independent family. For each < j < log N we create a substream consisting of those 
X G [U] with \sh{h{x)) = j. Let Lq(S) denote Lq of the substream S. For each we run an 
instantiation Bj of Lemma 14.71 with c = 141 and r] = 1/16. All instantiations share the same 
0(log(l/r7)) hash functions h^,..., /i0(iog(V'?)). 

To obtain our final estimate of Lq for the entire stream, we find the largest value of j for which 
declares Lq{S^) > 8. Our estimate of Lq is Lq = 2K If no such j exists, we estimate Lq = 1. 
Finally, we run this entire procedure 0(1) times and take the median estimate. 

Theorem 4.8. With probability at least 99/100 RoughEstimator outputs a value Lq satisfying 
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Lq < Lq < IIOLq. The space used is O (log (A^) log log (mM)), and the update and reporting times 
are 0(1). 

Proof. We first analyze one instantiation of RoughEstimator. The space to store h is 0(log A^). 
The G(log(l/7y)) hash functions /i* in total require 0(log(l/r7) log U) = 0{logN) bits to store since 
1/r/ = 0(1). The remaining space to store a single for a level is 0(loglog(mM)) by Lemma [4. 7^ 
and thus storing all across all levels requires space 0(log(A'^) loglog(mM)). 

As for running time, upon receiving a stream update (x, v), we first hash x using h, taking time 
0(1). Then, we compute lsb(/i(x)), also in constant time [8l[T9]. Now, given our choice of rj for 
B^ , we can update B^ in 0(1) time by Lemma 14.71 

To obtain 0(1) reporting time, we again use the fact that we can compute the least significant 
bit of a machine word in constant time. We maintain a single machine word z of at least logA^ 
bits and treat it as a bit vector. We maintain that the jth bit of z is 1 iff Lo(S^) is reported to 
be at least 8 hy B^ . This property can be maintained in constant time during updates. Constant 
reporting time then follows since finding the deepest level j with at least 8 reported elements is 
equivalent to computing Isb(z). 

Now we prove correctness. Observe that 'Ei[Lq[S^)] = Lq/2^~^^ when j < log A^ and E[Lo(5-')] = 
= Lq/N when j = logA^. Let j* be the largest j satisfying 'F:[Lq{S^)] > 1 and note that 
1 < E[Lo(cS-'*)] < 2. For any j > f, Pr[Lo{S^) > 8] < 1/(8 • 2^"^'-^) by Markov's inequality. 
Thus, by a union bound, the probability that any j > j* has Lq{S^) > 8 is at most (1/8) • 
J2T-j*=i 2-(^"^*-^) = 1/4. Now, let f* < j* be the largest j such that B[Lo{S^)] > 55, ff such a 
j exists. Since we increase the j by powers of 2, we have 55 < E[Lo(5-'**)] < 110. Note that h is 
pairwise independent, so Var[Lo(5-' )] < E[Lq{S^ )]. For this range of E[Lq{S^ )], we then have 
by Chebyshev's inequality that 



since 55 < E[Lo(5^'")] < HO. 

So far we have shown that with probability at least 3/4, Lo{S^) < 8 for all j > j* . Thus, for 
these j the B^ will estimate Lq of the corresponding substreams to be at most 8, and we will not 
output Lq = 2^ for j > j*. On the other hand, we know for j** (if it exists) that with probability at 
least 8/9, S^" will have 32 < Lo{Sf*) < 142. By our choice of c = 141 and r/ = 1/16 in the B^ , B^'* 
win output a value Lo^sf) > Lo(5f *)/4 > 8 with probability at least 1 - (1/9 + 1/16) > 13/16 
by Lemma 14.71 Thus, with probability at least 1 — (3/16 + 1/4) = 9/16, we output Lq = 2^ for 
some j** < j < j*, which satisfies 110 • 2^ < Lq < 2K If such a j** does not exist, then Lq < 55, 
and thus 1 serves as a 55-approximation in this case. 

Since one instantiation of RoughEstimator gives the desired approximation with constant 
probability strictly greater than 1/2 (i.e. 9/16), the theorem follows by taking the median of a 
constant number of independent instantiations and applying a Chernoff bound. ■ 




If |Lo(5J") - E[Lo(5J**)]| < 3VE[Lo(cSi")], then 



32 < 55 - 3^55 < Lq{S^") < 110 + 3 



< 142 



25 



4.3 Putting the Final Algorithm Together 

Our full algorithm FullAlg for estimating Lq works as follows. Set e' = e/420. Choose a 
ci log(l/e')/loglog(l/e')-wise independent hash function hi, pairwise independent hash functions 
/i2, /13, and random prime p G [D, D"^] for D = log(mM)/e^, as is required by LogEstimator. We 
run an instantiation LE of LogEstimator with desired error e' , an instantiation RE of ROUGH- 
ESTIMATOR, and log A^-log(l/(e')^) = log((e')^A^) instantiations LEq, . . . , LEiog((£/)2;v) of LogEs- 
timator in parallel with the promise Lo < l/(20(e')^) and desired error e' . All instantiations of 
LogEstimator share the same hi, /i2, /is, and prime p. We pick a hash function h : [U] — > [N] at 
random from pairwise independent family of hash functions. For each update {i,v) in the stream, 
we feed the update to both LE and RE. Also, if the length j of the longest suffix of zeroes in h{i) 
is at most log(l/(e')^), we feed the update {i,v) to LEj. 

Let R be the estimate of Lq provided by RE. If i? < l/(20(e')^), we output the estimate 
provided by LE. Otherwise, we output the estimate of Lq provided by LE|-iog(j:jy(44Qo(e/)2))-| . To 
analyze our algorithm, we first prove the following lemma. 

Lemma 4.9. Let j be a level such that < E[Lo(cS/)]. Then \2^' Lo{Si) - Lo| < 2eLo/3 with 

probability at least 7/8 for j' = j when j = log A^, and j' = j + 1 otherwise. 

Proof. Let S = {i : Xi ^ at the end of the stream} and for i £ S let Xij be a random variable 
indicating that i is hashed to the substream at level j, and let Xj = Yli^s -^id- assume here 
j < logA^ since the proof is nearly identical for j = log A'^. Then we have E[Xj] = and by 

pairwise independence of H, \'ar[Xj] < 'E[Xj]. Thus by Chebyshev's inequality, 

PrH2'»X - L,| > 2.L„/31 < < 1 

■ 

Now we prove our main theorem for Lq estimation. 

Theorem 4.10. FullAlg uses space 0(e~^ log(e^Af)(log(l/e) + log log(mM))), has 0(1) update 
and reporting times, and (1 it e)-approximates Lq with probability at least 3/4. 

Proof. We analyze one instantiation of FullAlg. The space and time requirements follow 
from Theorem 14.61 and Theorem 14. 8^ and the fact that the hash functions h, h^ can be stored in 
0(logC/) = 0{logN) bits and can be evaluated in constant time. 

As for correctness, with probability at least 99/100, the value R returned by RE satisfies Lq < 
R < llOLo by Theorem 14.81 We henceforth condition on this occurring. If i? < l/(20(e')^) then 
Lo < l/(20(e')^), so LE outputs {l±e')Lo = {l±e)Lo with probability at least 3/5 by TheoremS^l 
Otherwise, we output the estimate of Lq provided by LEj for j = |'log(i?/(4400(e')^))] . Let Lq 
denote the expected value Lq of the substream at level j. For our choice of j, Lo/(8800(e')^) < 
E[L^] < Lo/(40(e')^)- By Lemma and choice of e', (1 ± (2e/3))Lo/(8800(e')^) < Lq < (1 ± 
(2e/3))Lo/(40(e')^) < Lo/(20(e')^) with probability at least 7/8. By Theorem HSl conditioned 
on L^ < Lo/(20e')2 and by choice of e' , we have that LE^ outputs (1 ± e')^o = (1 ± (e/420))L^ 
with probability at least 3/5. Again by Lemma 14.91 using that 20/e2 < l/(8800(e')^) by choice of 
e' < e/420, we have that 2^~^^Lq serves as a (1 ± (e/420))(l ± (2e/3))-approximation to Lq in this 
case, which is at most (1 it e) for e smaller than some constant. Thus, in the case R > l/(20(e')^), 
FullAlg outputs a valid approximation with probability at least (3/5) • (7/8) > 33/64. Thus, in 
total, the algorithm outputs a valid approximation with probability at least (99/100) • (33/64) (since 
we conditioned on R being a valid approximation), which is strictly bigger than 1/2. The theorem 
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follows by repeating a constant number of instantiations of FullAlg in parallel and returning the 
median result. ■ 

When given 2 passes, in the first pass we can obtain R, then in the second pass we need only 
instantiate LEj for the appropriate level j, thus avoiding the log(e^A^) factor blowup in space from 
maintaining log(e^iV) different LE-,-. Thus we have the following theorem. 

Theorem 4.11. There is an algorithm (1 it e)-approximating Lq in 2 passes with probability 3/4, 
using space 0(e~^(log(l/e) + loglog(mM)) + log A^), with 0(1) update and reporting times. 

Note that when combined with Theorem 13.61 Theorem 14.111 shows a separation between the 
space complexity of 1 and 2 passes for Lq for a large range of settings of e and mM. 



5 Lq in update-only streams 

Here we describe an algorithm for estimating Fq, the number of distinct items in an update-only 
stream. Our main result is the following. The space bound is never more than a 0(log log N) factor 
away from optimal, for any e. 

Theorem 5.1. There is an algorithm for (1 it e)-approximating Fq with probability 2/3 in space 
0(e~^ loglog(e^A^) + log(l/e) log(A^)). The update and reporting times are both 0(1). 

The algorithm works as follows. We allocate K = 1/e^ counters Ci, . . . , Ck initialized to null, 
each capable of holding an integer in [log(e^A^) + 1], and we pick an 0(log(l/e)/loglog(l/e))-wise 
independent hash function hi : [1/e^] — > [K]. We also pick pairwise independent hash functions 
/i2 : [U] [1/e^] and /13 : [U] [N]. We run Algorithm I of p] to obtain a value Fo/2 < R < Fq 
with probability 99/100, taking 0(log + log log n) space and has constant update and reporting 
timqj. Upon seeing an item i E [U] in the stream, we set 
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Chiih2(i)) ^ max{C/ji(/j2(i)),min{log(e N) + 1, lsb(/i3(i))}}. 

We also maintain log(e^A^) counters Yi, . . •^iog(£2Ar)i where Yr tracks \{j : Cj = r}\. To estimate 
Fo there are three cases. U R < 100, we output \{j : Cj / null}|. Else, if 100 < i? < /^/40, we 
output ln(l — \{j : Cj 7^ null}|/i^)/ln(l — 1/K). Otherwise, let r be the smallest positive integer 
such that R/2'' < K/40. We define f{A) = K{{1 - 1/K)^ - (1 - and output 2'^A for the 

smallest A with f{A) = Yr. For time efficiency, hi is chosen from a hash family of Siegel [45] to 
have 0(1) evaluation time. 

We now analyze our algorithm. First, we need the following two lemmas, whose proofs are in 
Section IA.4.11 

Lemma 5.2. Fix x > 2. Consider the function 

f{y)- 

If y < x/3, then f'{y) > 1/9. 




®The space and time bounds are not listed this way in because (1) they do not assume the word RAM model, 
and (2) they do not ensure U = 0(log TV) but rather just use a universe of size n. 
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Lemma 5.3. Let < e < 1/2 and suppose (1 — e)B < B' < {I + e)B with < B < K for integers 
B,B'. liA,K>0 then 

K ^1 

We now prove correctness of our Fq algorithm and analyze the update and reporting times and 
space usage. 

Proof (of Theorem 15. ip . Our use of an algorithm of [6] to obtain a R requires O(logA^) space 
and adds 0(1) to both the update and reporting time. Now we analyze the rest of our algorithm. 

First we analyze space requirements. We maintain counters Cj, each holding an integer 
in [log(e^iV) + 1] (or null), taking 0(e~^ loglog(e^iV)) bits. Storing hi from Siegel's family takes 
0(e~^ log(l/e)) = o(e~^) bits, as in the proof of Theorem 14.61 The functions /i2,^3 combined 
require 0(log + log(l/e)) bits. Finally, the last bits of required storage come from storing the 
Yr, which in total take 0(log(K) log iV) = 0(log(l/e) log(iV)) bits. 

Now we analyze update time. Each update requires evaluating each of /ii,/i2,/i3 once, taking 
0(1) time. We also compute the Isb of an integer fitting in a word, taking 0(1) time using [8l I19j. 
Finally, we have to maintain the 1^. During an update, we change the value of at most one Cj, 
from, say, r to r'. This just requires decrementing Yr and incrementing 1^/. 

Before analyze reporting time, we prove correctness. We condition on the event Q that Fo/2 < 
R ^ Fq. If R < 100, then Fq < 200 and the distinct elements are perfectly hashed with 7/8 
probability (for e sufficiently small), and we estimate Fq exactly in this case. If 100 < R < K/20, 
correctness follows from Lemma l4.ll We now consider R > K/20. We consider the level r with 

1 R 1 

< — < 



80^2 2'- - 40e2 
and thus 

1 1 



80^2 2'- - 20e2 

Letting Fq be the number of distinct elements mapped to level r, we condition on the event Q' that 
Fq = (1 lb 50e)Fo/2^'. For s sufficiently small, this implies 

^0 ^ T^, ^ Fo 



<^<F'<-^< 



160e2 2'-+i - " - 2'-i - 10^2 ■ 

We also let Fq be the number of distinct elements mapped to levels r' > r and condition on the 
event Q" that Fq' = (1 ± 50e)Fo/2^. This similarly implies 

Next, we condition on the event Q'" that the Fq + Fq < l/(5e2) items at levels r and greater are 
perfectly hashed under /12. Now we use our analysis of the balls and bins random process described 
in Section IA.4I with A = Fq "good balls" and B = Fq "bad balls" . Let X' be the random variable 
counting the number of bins Cj hit by good balls under hi. By Lemma lA.171 and Lemma lA.22t 
E[X] = (1 ± e)/i with 
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We define the event Q"" that X' = {l± 4002e)/i. 
Recah the definition of the function 



f{A) = 1 - 1 




Conditioned on Q, Q', ^ = (1 ± 100e)B, and thus by LemmaES = (1 ± 200e)/(A). 
Conditioned on Q"" , 

\X' - f{A)\ < \X' - ^1 + 1^ - f{A)\ < 4002e^ + 200e/(A) < 4202eK, 

in which case also \X' - f{A)\ < Jf/lOOO for e sufficiently smah, implying K/imQ < X' < /f/9. 
The lower bound holds since f{A) > iC/500 by Lemma IA.201 and the upper bound holds since 
f{A) < A < A'/IO. We also note /(i^/3) > K{e-^l^ - e'^/^ - l/K) by Lemma [Al9l which is 
at least for K sufficiently large (i.e. e sufficiently small). Thus, there exists A' < K/3 with 
f{A') = X'. Furthermore, by Lemma 15.21 is the unique inverse in this range. Also, in the range 
where we invert X' , the derivative of / is lower bounded by 1/9, and so 

\f~^{X') - A\<{9- 4202)eK < W'^eA. 

Thus, we can compute A with relative error W'^eA, and so 2^ A = (1 it 50e)(l it W'^e)FQ. We can 
thus obtain (1 it e)FQ by running our algorithm with error parameter e' = ce for c a sufficiently 
small constant. Thus, our algorithm is correct as long as Q, Q', Q" , Q'" , Q"" all occur. 

Now we analyze the probability that all these events occur. We already know Pr[Q] > 99/100 
by our choice of failure probability when running the algorithm of [6]. By Chebyshev's inequality, 

^ r^/,^n S'' 80 19 
Pr O' O > 1 > 1 > — 

and the exact same computation holds for lower bounding Pr[Q"|Q]. 

Now we bound Pr[Q"'\Q' A Q"]. Arbitrarily label the z = + F^' balls as 1, 2, . . . , z with 
z < K/5. Let Zij indicate that h2{i) = /i2(j)- Then the expected number of collisions is at most 
((K/5)V2) • (l/K^) = 1/50. Thus, by Markov's inequality, Pr[Q'"|Q' A Q"] > 49/50. 

By Lemma |A.18[ Lemma lA. 20) and Lemma lA.221 

E[X'] > (1 - e)K/500, Yar[X'] <7K + e^ 
and thus by Chebyshev's inequality, 

Vr\\X' - E[X'1| < 4000eE[X'l|Q'"l > 1 ^ ^, , > — 

^' L Ji - L Ji-^J- 40002e2(i _e)2(^/500)2 - 16 

with the last inequality holding for e sufficiently small. When \X' — E[X']| < 4000eE[X'] occurs, 
then X' = (1 ib4002e)/i, implying Q"" occurs. Thus, by the above and exploiting independence of 
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some of the events, 

Pr[Q A Q' A Q" A Q'" A Q""] > Pr[Q] • (1 - Pr[Q'|Q] - Pr[Q"|Q]) 

• Pr[Q"'\QAQ' AQ"] 

■ Pr[Q""\QAQ' AQ" AQ'"] 

= Pr[Q]-(l-Pr[Q'|Q]-Pr[Q"|Q]) 

• Pr[Q"'\Q' AQ"] 

■ Pv[Q""\Q"'] 
( 99 







/13 


20 / 


■(!)■ 


U6 



- Vloo 

> 2/3 

Finally, we analyze the reporting time. Recall we can query for R in constant time. In the case 
R < 100, we output the number of non-null bins, which we can maintain in constant time during 
updates using an 0(log(l/e))-bit counter. For 100 < R < K/40, our reporting time is 0(1) by 
using Lemma 14.51 Otherwise, we need to find the smallest positive A satisfying K{{1 — — 
(1 - l/Kf^) = Yr. For this we can discretize the interval / = [/(K/1000), /(K/9)] into @{l/e) 
evenly-spaced points V and precompute f~^{p) for all p £ V during preprocessing. We can then 
compute f~^{x) for any x S / by table lookup, using the nearest element of V to x, thus inverting 
/ with at most an additive zizeK/160 = zizeA error. Note we argued above that X' will be in / 
conditioned on the good events. Also, this upper bound on the error suffices for our algorithm's 
correctness. ■ 
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A Appendix 



A.l Small Universe Justification 

If n < m^, we can do nothing and already have a universe of size n. Otherwise, let {ii, . . . 
be the set of indices appearing in the stream. Picking a prime q and treating all updates {i,v) as 
(i mod q, v), our estimate of Lq will be unaffected as long as ij-^ ^ ij^ mod q for any ji / j2. There 
are at most differences — and each difference is an integer bounded by n, thus having 
at most logn prime factors. There are thus at most rlogn prime factors dividing some — 
If we pick a random prime q £ [r log(n) log(r log(n)), c • r log(n) log(r log(n))] for a sufficiently large 
constant c, we can ensure with constant probability arbitrarily close to 1 (by increasing c) that no 
indices collide modulo q. Since r < m, we can pick q = 0(poly(m log n)). We then pick a hash 
function h : {0, . . . , g — 1} — > [0(m^)] at random from pairwise independent family. With constant 
probability which can be made arbitrarily high, the mapping i i— > h{i mod q) perfectly hashes the 
indices appearing in the stream. Storing both h and q requires 0(log g+log m) = 0(log m+log log n) 
bits. Since we only apply this scheme when < n, the O(logm) term only appears in our space 
bounds when logm = O(logn). Thus, the cost of this scheme is 0(logA^ + log logn), and the logN 
term is dominated by other factors in all our space bounds. 

A. 2 Notes on the Proof of Lemma 12.81 

In our proof of Lemma [2.8l we needed a function / : C ^ C such that the following properties hold 
when restricting / to M: 

• / is an even function 

• / decreases strictly monotonically to as x tends away from 

• / is strictly positive 

Also, to apply Corollarv 12. 7^ we needed / to be holomorphic on C, and we needed \ f{z)\ = 
gO(i+3(2)) £qj. z gC. We now justify why 

r sin^(y) 
fix) = - / — 5— rfy 

has all these properties. First, note the integral exists for all x and thus / is well-defined. Now, / 
is even since it is the integral of an odd function. It decreases monotonically to as x tends away 
from since the sign of f'{x) is the sign of — sin^(x)/x^, which is just the sign of —x. It is strictly 
positive since on the negative reals it is the integral of a strictly positive function, also implying 
that / is strictly positive on the positive reals since it is even. This also implies /(O) > since / 
is maximized at 0. 

Now, / is holomorphic on C by construction: it is the integral of a holomorphic function on C. 
To see that /' is holomorphic, note f'{z) = sinc^(z) sin(2;) is the product of holomorphic functions. 
Lastly, we need to show that |/(2;)| = e*^^^^^^^)^. This can be seen using Cauchy's integral theorem, 
which lets us choose a convenient curve when computing the line integral from — oo to z of /'. We 
choose the curve which goes from — oo to ^(z), then goes from to ?R.{z)+iQ{z), thus integrating 
the real and imaginary axes separately (here 3f?(z) denotes the real part of z). The integral on the 
real part of the curve is bounded by a constant. The integral on the imaginary part is bounded by 
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gO(i+|S'(2)I) g[Yice sin{z) = (e-3'(2)+«3?^ _ ^'^{z)-i^(z)y2. Each term in the difference is bounded in 
magnitude by e'^^^-". 

We also comment on making the constants C, C expHcit in the proof of Lemma l2.8i Recall, for 
the function g{c) = E[/(cZ)] (where Z ~ Pp), we picked positive constants C large enough and C 
small enough such that g{C) and g{C') landed in some desired range. Knowing C, C is necessary 
to understand the quality of the constant-factor approximation the median estimator gives. These 
C, C depend on p, and can be found during preprocessing in constant time and space (as a function 
of constant p) as follows. First, note g{c) is strictly decreasing on the positive reals with g{Q) = /(O) 
and limc_»oo g{c) = 0, and thus we can binary search, using the usual trick of geometrically growing 
the interval size we search in since we do not know it a priori. The question then becomes how to 
evaluate g{c) at each iteration of the search. E[/(cZ)] is defined as the integral 

/oo 
f {cx)q{x)dx 
-oo 

where q is the probability density function of Vp. We only need to compute this integral to within 
constant accuracy, so we can compute this integral numerically in constant time and space. We note 
a clumsy implementation would have to numerically integrate in a 2-level recursion, since / and q 
themselves are defined as integrals for which we have no closed form. A slicker implementation can 
use Parseval's theorem, which tells us that 

r f{cx)q{x)dx = r -f (^) qiOd^ 

We claim that the latter integral lets us avoid the recursive integration step because we do have 
closed forms for f,p. By definition of p-stability, g(.^) = e"'^' . For /, recall /' = sinc^(x) sin(x). 
The Fourier transform of sine is the indicator function of an interval, and that of the sin function 
is the difference of two shifted 5 functions, scaled by an imaginary component. By convolution, 
the transform of /' is thus a piecewise-polynomial that can be written explicitly, and thus we can 
compute / explicitly since integration corresponds to division by in the Fourier domain. 



A. 3 Details of the Improvement to Armoni's PRG 
A. 3.1 GUV Extractor Preliminaries 

The following preliminary definitions and theorems will be needed throughout Section IA.31 
Theorem A.l. The following families of polynomials are irreducible over the given rings: 

(1) x"^-^' + x^' + I e ¥2[x], e > 

(2) x^' + 2x^'~' - 1 E F7[x], ^ > 1 

(3) x^' + 3 G YjIx], i>0 

Proof. Polynomials in family (1) are shown irreducible in Theorem 1.1.28 of [38]. Polynomials in 
families (2) and (3) are shown irreducible in Examples 3.1 and 3.2 of [35]. ■ 

Theorem A. 2 ([M]) Corollary 3.47). Let p be an irreducible polynomial over ¥q[x] of degree d. 
Then p is irreducible over Fgm[x] if and only if gcd(m, d) = 1. ■ 
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The following fact is folklore. 

Fact A. 3. Multiplication and division with remaindering of two polynomials of degree at most n 
in ¥q[x] can be performed in time poly (n log and space O(nlogg). 

Definition A. 4. A D-regular bipartite graph T : [N] x [D] — > [M] is a (< K, A) expander if 
|r(5)| > ^ • l^l for all S C [N] with \S\ < K. T{x,y) is the yth neighbor of the left vertex x. 

Definition A. 5. A probability distribution X on {0, 1}" is called a k-source if Pr[X = x\ < 
for all X G {0, 1}". We interchangeably use "X is a /c-source" and "X has min-entropy k" . 
Henceforth we let C/„ denote the uniform distribution on {0, 1}". 

Definition A. 6. A function C : {0, 1}" x {0, 1}'' {0, 1}™ is called a k k' condenser if 
C(X,U(i) is e-close in statistical distance to some distribution of min-entropy at least k' whenever 
X is a A;-source. A condenser is called lossless if k' = k -\- d. The statistical distance of two 
probability distributions is defined to be half their Li distance. 

Definition A. 7. A function E : {0, 1}" x {0, 1}'' {0, 1}"" is called a (k, e) extractor if E{X, U^) 
is e-close in statistical distance to Um, whenever X is a /c-source. 

In Section IA.3.21 we will write write the expansion of graphs we consider as (1 — £)D, where 
e > is some parameter. All logarithms below are base-2 unless otherwise stated. 

A. 3. 2 The GUV Extractor in Linear Space 

For a given positive integer h and prime power q, and for a degree-n irreducible polynomial E 
over ¥q and positive integer m, Guruswami et al. [23] consider the bipartite graph with neighbor 
function T x¥g^ F^+^ defined by 

r{f,y) = [yj{y), (/" mod E){y), if^' mod E){y), (/'^'"" mod E){y)] (A.l) 

where / S is interpreted as a polynomial of degree at most n — 1 over F^. In particular, the yth 
neighbor of / in the expander is the yth symbol of the encoding of / under the Parvaresh-Vardy 
code [42]. The authors of [23] then prove the following theorem. 

Theorem A. 8 (Theorem 3.3 of [23]). The bipartite graph T : xFg ^ F™+^ defined as in 
Eq. (jA.ip is a (< -fi'max) ^) expander with -RTmax = ^"^ and A = q — (n — l){h — l)m. ■ 

For positive integers N, -ftTmax < N, and for any e > 0, and all a G (0, log x/ log log x) with x = 
(log A^)(logi^max)) [23] then apply Theorem lA.SI to analyze the quality of the expander obtained 
using the setting of parameters in Figure [3l For our purposes though, we are only concerned with 
< a < 1/2, a = r2(l), and will thus present bounds assuming a in this range. We also assume 
N, Amax > 2. 

Theorem A. 9 (Theorem 3.5 of [23]). The graph with parameters as stated in Figure [3] yields a 
(< Kmax, (1-e)^) expander with N left vertices, left-degree D = 0(((log iV)(logiCmax)/e)^+^/"), 
and M < ■ A'max right vertices. Furthermore, the neighbor function T{f,y) can be computed 
in time log*^''^'*(AL'), and D and M are each powers of 2. ■ 

While the setting of parameters in Figure [3] yields an expander whose neighbor function is time- 
efficient, for our purposes we need a neighbor function that is both time-efficient and space-efficient. 
To accomplish this goal, we use the following setting of parameters instead. Throughout this section, 
we borrow much of the notation of |23| for ease of noting differences in the two implementations. 
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• n = log 

• /c = log i^max 

• h = [(2nA:/e)i/"] 

• m = r(logi^max)/(log/i)l 

• g is the unique power of 2 in (/i^"*"°/2, 

Figure 3: Setting of parameters in the GUV expander (see proof of Theorem 3.5 in |23j). 

• n chosen in (log(A^)/ log(g), 31og(A^)/ log(g)] so that n = 3^ for some ^ G N 

• A; = log i^max 

• z = 31og(iV)fc/e 

• a' < a is chosen as large as possible so that {z^^^/^' ) /2 is of the form 7^*^ for some £ E N 

• hQ= z^/"' 

• h = \ho\ 

. g = (/ii+°')/2 

• 771= [(logi^max)/(log/i)l 

Figure 4: New setting of parameters for the GUV expander 

Theorem A. 10. The graph with parameters as stated in Figure H] yields a (< A'max) (1 — 
expander with left vertices, left-degree L> = 0(((logiV)(logKmax)/e)^+^/"), and M < D^-i^^ax 
right vertices. Furthermore, the neighbor function r{f,y) can be computed in time log*^^^^(A^D) 
and space 0(log(A^L')). 

Proof. The proof is very similar to that of Theorem 3.5 of [23], but taking the new parameters 
into account. First we show a' > a/3. Note there is always an integer of the form 7^^ in 
whenever t >7. Since a < 1/2 and z > 3, we have 

Setting t = 

^(i+i/a)^ we have t > 3^ > 7, implying the existence of an integer of the form 7^ in 
[(zi+V")/2, (zi+i/W3))/2] so that a' > a/3. 

The number of left vertices of F is > N. By choice of m, < Xmax ^ Thus, the 

number of right vertices M satisfies 

M = q^+^ < g2/,(l+a')(™-l) < ^2^(l+a)(m-l) < g2^1+a 

The left-degree is 

D = q< h'+-' < {ho + 1)^+"' = 0((31og(iV)A:/e)i+V«') = 0( ( (log iV) (log Kmax)^) 
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with the penultimate equahty following since a = 0(1). 

The expansion is A = q — (n — l){h — l)m > q — nhk. As in [23], we now show nhk < eq so that 
q — nhk > q — eq = {1 — e)D. Since /i" > 31og(A'')A;/e > 3nk/e, we have nhk < {e/3)h^^'^ < eq. 
The final inequality holds since, by the fact that a' < a < 1/2 and ho > z"^ > 9, 

_ hl+-' ^ 2((/io + l)i+°')/3 ^ r/iol^+"' _ h^+-' 
^ 2 - 2 - 3 3 

Calculating r(/, y) requires performing arithmetic over the finite field Fg, which can be done by 
multiplying polynomials in F7[x] of degree at most (log'^q) — 1 modulo an irreducible polynomial 
E' of degree logy q. By choice of q, E' can be taken from family (2) of Theorem lA.ll Also, as 
stated in Eq. (IA.1|) . we must take powers of / modulo an irreducible E of degree n. By choice of n, 
the polynomial E can be taken from family (3) of Theorem lA.ll The irreducibility of E over ¥q[x] 
follows from Theorem IA.2I since gcd(2^,3^ ) = 1 for any £,£'. 

The time complexity is immediate. For space, in calculating r(/, y) for k = 0, . . . , m — 1 we 
must calculate fk = f^^ mod E then evaluate fk{y) = <?• We have fk = fk^i mod E, which 
we can calculate time-efficiently in 0([n] log(g')) = 0(logA^ + logq) space by iterative sucessive 
squaring. Evaluating fk{y) takes an additional O(logg) space. In the end, we must perform 
m + 1 such evaluations, taking a total of 0{mlogq) = O(logM) space. The total space is thus 
0(log iV + log D + (1 + a) log A'max) = 0{log{DN)) since Kmax < iV and a = 0(1). ■ 

Given their expander construction, the authors of [23] then use an argument of Ta-Shma et 
al. [l6] that for positive integers n,m,d and for e S (0, 1) and k G [0, n], a (< \2^~\ , (1 — e) • 2'^) 
expander yields a k — >£ k + d condenser. Specifically, as argued in [46], the constructed expander is 
a condenser, where the input string is treated as a left vertex of an expander with left-degree 2*^, and 
the output string is the index of the right-hand side vertex obtained by following the random edge 
corresponding to the seed. GUV could immediately apply this connection to obtain a condenser 
since their M, D of Theorem I A. 91 were powers of 2. In Theorem I A. 101 however. D,M are not powers 
of 2 (they are powers of 7). Dealing with M not being a power of 2 is simple: one can add dummy 
vertices to the right hand side of the expander to make M a power of 2, at most doubling M in the 
process. The problem with D not being a power of 2 though is that a seed s of length d = [log D~\ 
does not yield a uniformly random neighbor if one interprets s modulo D. To deal with this issue, if 
we desire a condenser whose output has statistical distance e from a /c'-source, we increase the seed 
length to d = [log D~\ + [log(l/e)] -|- 1. Now, interpreting the seed as a number in a range of size at 
least {2/e)D, the seed modulo D does yield a random neighbor conditioned on the good event that 
the seed is not larger than 2L'°s(^-^/^)J , which happens with probability at least 1 —e/2. Statistical 
distance e can thus be achieved as long as the expander has expansion at least (1 — e/2)D. This 
gives the following theorem. 

Theorem A. 11 (Based on Theorem 4.3 of [23]). For every positive integer n, and every kjjxBix. — ^! 
e > 0, and < a < 1/2, a = 0(1), there is a function C : {0,1}" x {0,1}'^ {0,1}'" with 
d = {I + l/a) ■ (logn -I- log fcmax) + 0(log(l/e)) and m < 2d + (1 -|- a)A;max + 1 such that for all 
k < ^maX) C is a — k + d lossless condenser. Furthermore, for any x G {0, 1}" and s G {0, 1}'^, 
C(x,s) can be computed in 0(n -|- log(l/e)) space and poly(nlog(l/e)) time. 

Proof. The proof of the theorem, except for the space upper bound, can be found as Theorem 4.3 
of [23]. The space requirement follows follows from Theorem lA.lOi ■ 

In the construction of one of their extractors (the extractor we will be concerned with), [2^ 
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uses the following extractor of Impagliazzo, Levin, and Luby [24| as a subroutine, based on the 
leftover hash lemma. 

Theorem A. 12 (Based on [M]). For all integers n = 2-3^, < n, with £ > an integer, and for all 
e > 0, there is a (/c, e) extractor E : {0, l}"x{0, l}'^ {0,1}"^ withd = n and m > k+d-2log{l/e) 
such that for all inputs x, y, E{x, y) can be computed simultaneously in space 0{n) and time rf"^^\ 

Proof. The proof is sketched in [23] (and given fully in [24]) except for the analysis of space 
complexity. We review the scheme so that we may prove the space bound. Elements of {0, 1}" are 
treated as elements of F2n, and £^(3;, y) = (y, xylm), where xy|m is the first [A: + d — 21og(l/e)] bits 
of the product xy over ¥2^ ■ The time and space complexity are thus dictated by the complexity of 
multiplying two elements of F21 and remaindering modulo a reducible E of degree polynomial of 
degree n. By the form of ?i, we can take E to be from family (1) in Theorem lA.ll The claim then 
follows by Fact \EM ■ 
The authors of [23] then give the following extractor construction. 

Lemma A. 13 (based on Lemma 4.11 of [23]). For every integer t > 1 and all positive integers 
n> k and ah e > 0, there is a {k, e) extractor E : {0, 1}" x {0, l}'^ {0, 1}™ that can be computed 
in poly(n log(l/e)) time and 0(n + log (1/e)) space with m = \k/2\ and d < k/t + 0{\og{n/ e)). 

Proof. The analysis of running time and proofs of correctness and output length are identical to 
[23], so we focus on space analysis. We now review the algorithm of [23] for computing E{x,y). 

1. Round t to a positive integer and set Eq = ej (4t + 1). 

2. Apply the condenser of Theorem lA.llI with error ol = l/(6t), min-entropy A:, and seed 
length d! = 0(log(n/e)) to x, using the first d! bits of y. The output x' of the condenser will 
be of length at most n' = (1 + a)k + 0(log(n/e)). 

3. Partition x' into It blocks x'^^ . . . ^x^t of size n" = [n'/(2t)J or n" + 1 and set /c" = k/['it) — 
0{\og{n/e)). 

4. Let E" be the extractor of Theorem lA. 121 for min-entropy k" with input length n" + 1, seed 
length d" = k/t + 0(log(n/e)), and error parameter eq. For this setting of parameters, the 
output length of E" will be m" > maxjd", k" + d" — 21og(l/eo)}- Now output {zi, . . . , Z2t) 
where 7/24 is the last d — d' = d" bits of y, and for i = 2t, . . . , 1, (y^_i, Zi) is defined inductively 
to be a partition of E"{x[, y[) into a d"-b\{, prefix and {m" — d")-h\t suffix. 

We now analyze the space complexity of computing this extractor. First we note d = k/t + 
0(log(n/e)) = 0(n + log(l/e)). Step 2 requires 0{d' + {l + a)n) space, which is 0(n + log(l/e)). To 
apply E" , by Theorem I A . 1 2 1 we need n"+l to be of the form 2-3^; for now assume this, and we will fix 
this later. Each evaluation of E" in Step 4 takes space 0{n") = 0{n' /t) = 0((1 + a)A; + log(n/e)) = 
0{n + \og{l/ e)). We also have to maintain the zi as we generate them, but we can stop the recursive 
applications of E" in Step 4 once we have extracted \k/2\ bits. Also, there are only 2t = 0(1) 
levels of recursion in Step 4, so an implementation can keep track of the current level of recursion 
with only 0(1) bits of bookkeeping. The total seed length is d" + d' = k/t + 0(log(n/e)). 

Now, to fix the fact that n" + 1 might not be of the form 2-3^, we increase n" so that this does 
hold. Doing so increases n" by at most a factor of 3. Since d" = n" , we increase the seed to be of 
length 3k/t + 0{log{n/e)), but this can be remedied by applying the above construction for t' = 3t. 

m 

We now come to the final theorem we will use from 1231. 
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Theorem A. 14 (Theorem 4.17 of [23]). For all positive integers n > 0,k < n and for all e > 0, 
there is a {k,e) extractor E : {0, 1}" x {0, l}"* {0, 1}™ with d = 0(log n + log(l/e)) and m>k/2 
where E{x,y) can be computed in time poly(nlog(l/e)) and space 0(n + log(l/e)) for all x,y. 

Proof. The construction here is completely unchanged from [23]. We only analyze the space 
complexity, as the time complexity and extractor parameters were analyzed in [23]. To perform 
this analysis, we present the details of the construction. 

Define Eq = e/poly(re). For any integer k, define i{k) to be the smallest integer i such that 
k < 2^ ■ 8d with d = clog(n/eo) for some large constant c. For every k £ [0,n], GUV define an 
extractor recursively. In their base case, i{k) = so that k < 8d. Here they apply Lemma lA. 131 
with t = 9. 

For i{k) > 0, Ek{x,y) is evaluated as follows. 

1. Apply Theorem lA. 1 1 1 to x with a seed length of O(log(n/eo)) to obtain a string x' of length 
(9/8)A; + 0(log(n/eo)). 

2. Divide x' into two equal-sized halves x\, x'^- Set k' = k/2 — k/S — O(log(n/eo)), for which 
k' > 2d by setting c sufficienty large. Set E' = Ey, which has seed length di = d, and obtain 
a (2d, eo) extractor E" from Lemma [A. 131 with t = 16, output length d, and seed length 
d2 = d/8 + 0(log(n/eo)). 

3. Apply E" to X2 to to yield an output y. Output E'{xi,y), which has length at least k/6. 

The total seed length is d/8 + O(log(n/eo)). To yield k/2 bits of output and not just k/6, repeat 
Steps 1 through 3 above but with k replaced by ^2 = 5A;/6— 1. Then repeat again with k-^ = 5/C2/6— 1 
then ki = 5^3/6 — 1. The total number of output bits is then (1 — (5/6)^)A: — 0(1) > k/2, and the 
seed length has increased by a factor of 4, but is still at most d. 

Now we analyze space complexity. Steps 1 through 3 above are performed four times at any 
recursive level, so we have a recursion tree with branching factor four and height 0(log/c). At a 
level of recursion where we handle some min-entropy k" , the input at that level is some x" of length 
Q(k") along with a seed of length d (except for the topmost level which has input x of length n). For 
all levels but the bottommost in the recursion tree, we have k" > d so that the total space needed 
to store all inputs at all levels when performing the computation depth-first on the recursion tree 
is bounded by a geometric series with largest value Q{k + d). At each non-leaf node of recursion 
we run the extractor from Lemma I A. 131 four times, each with input length Q(k") and seed length 
d < k", using space 0{k"). We therefore have that no level uses space more than 0{k") to perform 
its computations. The total space to calculate E^ is thus 0(n + k + d) = 0{n + log(l/e)). ■ 

A.3.3 Applying GUV to Armoni's PRG 

We begin with a formal definition of a pseudorandom generator. 

Definition A. 15. A function G : {0, 1}' {0, 1}^ is a J -pseudo-random generator (7-PRG) for 
space S with R random bits if any space-5 machine M with one-way access to R random bits is 
^-fooled by G(U;). That is, if we let M{x,y) denote the final state of the machine M on an input 
X and i?-bit string y, \\M{x, \Jr) — M{x, G(U;))|| < 7 for all inputs x, where || A — B| | denotes the 
statistical difference between two distributions A, B. 

Armoni defines a 7-PRG slightly differently. Namely, in his definition the machine M outputs 
a binary answer ("accept" or "reject"), and he only requires that the distribution of the decision 
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made by M changes by at most 7 in statistical distance for any input. However, in fact the PRG 
construction he gives actually satisfies Definition IA.15I This is because he models the machine's 
execution on an input x by a branching program with R layers and width 2'^ in each layer. One 
should interpret nodes in the branching program as states of the algorithm, where each node in 
the ith layer, i < R, has out-degree 2 into the (i + l)st layer with the edges labeled and 1 
corresponding to the ith bit of randomness. Armoni then actually provides a PRG which 7-fools 
branching programs with respect to the distribution of the final ending node, i.e., the final state of 
the algorithm. 

Henceforth we summarize the PRG construction of Armoni [3] to illustrate that the space- 
efficient implementation of the GUV extractor described in Theorem I A. 141 gives a PRG using seed 
length 0{{S/ (log S - log log R + 0(1))) log R) to produce R pseudorandom bits for any R = 2^(5) 
which fool space-S machines with one-way access to their randomness. We assume R > S, since 
otherwise the machine could afford to store all random bits it uses. 

To use notation similar to that of Armoni, for an extractor E : {0, 1}*^ x {0, 1}* {0, 1}'', define 
GE,n : {0, 1}^+"* ^ {0, 1}"^ by 

GE,n{^,yi,y2, ...,yn) = E{x,yi) ■ ■■E{x,yn) 

where x G {0,1}'^ and Ui G {0,1}*. To obtain a 7-PRG, Armoni recursively defines functions 
Gi{0, 1}'^ X {0, 1}{^-1)'^' X {0, l}"-i* ^ {0, 1}^ as follows. 

1. Gi{xi,yi,...,yn) = GE,n{xi,yi,...,yn) 

2. Gi{xi, . . . ,Xi,yi, . . . ,yn) = Gi_i(xi, . . . ,Xi^i,GE',nixi,yi, ■ ■ . ,y„,_i)) 

where k = 0(5), k' = 0(3 + log(l/7)), t = 0(log(i2/7)), and m = ni_i/e(S/(log + log(l/7))) 
for i > with no = R. The extractor E has input length k and seed length t, while E' has input 
length k' and seed length t. The string xi is in {0, 1}'^, while X2, . . . , Xj G {0, l}'^ and yi G {0, 1}*. 
The final PRG is defined as G = Gh, with h = G(log(i?)/(max{l, log(S) - loglog(i?/7))})). For 
each i < h, the output of GE',n is split into equal-size blocks of size t to obtain the yi, ■ ■ ■ ,yn^ for 
Gi+i. 

In Armoni's proof of correctness of his PRG, he needs the following type of extractor. For every 
integer £ and every e > 0, he requires a {i/2,e) extractor E : {0,1}^ x {0,1}'^ {0,1}^/^ with 
d = Q(\og(i/e)). The extractors E, E' above must be taken to have these parameters with i = k 
and i = k' . By Theorem I A. 141 we know such E, E' can be chosen that can be evaluated in space 
0{k + t) and 0{k' + t), respectivelyJil We now analyze the space-complexity of computing any single 
bit in the output of G. We must store a seed of length 0{k + k'{h — 1) + t), which is 0{{{S + 
log(l/7))/ max{l, log 5 — loglog(i?/7)}) log i?) (see Theorem 2 of j3] for a detailed calculation). To 
calculate a single output bit, in a recursive implementation there are h = 0{logR) = 0{S) levels of 
recursion, and in each we must evaluate either E or E' on some yi, split the output of that evaluation 
into blocks, then recurse. At a level i of recursion we need to know the seed yi we have recursed on, 
as well as which output bit bi we will want in Gi{xi,X2, . . . ,Xi,y). The value 6, fits into at most 
logi? bits, and the length of yi is t = 0(log(i?/7)). Note though that once we have calculated yi-i 
and for our recursive step to the (i — l)st level, we no longer need to know yi and bi. Thus, 
the yi and bi can be kept in a global register, taking a total of i = 0(log(ii/7)) = 0{S + log(l/7)) 

^We note Armoni defines extractors to be "strong", i.e. the seed appears at the end of the output. It is known 
that the GUV extractor can be easily made strong with no increase in complexity (see Remark 4.22 of [23]). 
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bits throughout the entire recursion. At each level of recursion we must perform one evaluation of 
an extractor, which takes space 0{k' + t) = 0{S + log(l/7)). We thus have the following theorem, 
which extends Corollary 1 of [3] by working for the full range of R, as opposed to just R < 2'^^ 
for some 6 > 0. 

Theorem A. 16. For any 7 > and integers S > l,R = 20(^), there is a 7-PRG stretching 
^( maxji iog5''^iog'iog(fi/7)} bits of Seed to R pseudorandom bits 7- fooling space-5 machines such 

that any of the R output bits can be computed in space 0(S' + log(l/7)) and time poly(S'log(l/7)). 
■ 

We note that Indyk's algorithm is designed to succeed with constant probability (say, 2/3), so 
in the application of Theorem I A. 161 to his algorithm, 7 is a constant. 



A. 4 A balls and bins process 

Consider the following random process which arises in the analysis of both our Fq and Lq algorithms. 
We throw a set of A "good" balls and B "bad" balls into K bins at random. In the analysis of our 
Lq algorithm, we will be concerned with the special case -6 = 0, whereas the Fq algorithm analysis 
requires understanding the more general random process. We let Xi denote the random variable 
indicating that at least one good ball, and no bad balls, landed in bin i, and we let X = X^^^ Xi. 
We now prove a few lemmas. 

Lemma A. 17. 

and 



Proof. The computation for E[X] follows by linearity of expectation. 
For Var[X], we have 

Var[X] = E[X2] - E^fX] = ^ ^[Xf] + 2 ^^[XiXj] - E^fX] 

i i<j 

We have E[X?] = E[Xj], so the first sum is simply E[X]. We now calculate E[XjXj] for i / j. Let 
Yi indicate that at least one good ball landed in bin i, and let Zj indicate that at least one bad ball 
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landed in bin i. Then, 

E[XiXj] = Pr[YiAYjAZiAZj] 

= Fr[Zi A Zj] ■ Pr[Yi A Yj \ Zi A Zj] 

= Vr[Zi A Z,] • Pr[y, A Y^] 

B 



1 - f ) • (1 - Pr[y, A Y,] - Vv[Y, A Y,] - Vv[% A Y,]) 

•(i-Pr[y.Ay,]-2.Pr[y,Ay,]) 

/ / 2\^ / / 1 



1-^ -1-1-^ -2 1-^ 1-1 



The variance calculation then follows by noting 2 ^^^^ E[XjXj] = K{K — X)'¥j\X\X'^ then 
expanding out E[X] + K(K - \)Y\XxX^ - E^fX]. ■ 

Lemma A. 18. If ^ > KjXm and A,B < K/2, then E[X] > K/500. 
Proof. Applying Lemma lA. 171 

B\ f ( A W K A ( A\ K 2, K 



320 4 - 500 



In the next lemma, we use the following inequalities. 
Lemma A. 19 (Motwani and Raghavan [371 Proposition B.3]). For all t, n G M with n > 1 and 

\t\ < n, 



eHl--) < fl + -V<e* 
n J V n 



Lemma A.20. E A, B < K/4. then Var[A] < 7K. 
Proof. Applying Lemma IA.17I and Lemma lA.lOj 

^ N iA+B)/K 



/ 1 \ ' 

Var[A] < Ke-^/^-i^e-(^+^)/^(l--ij 

+ i^(i^-l)e-2S//^ l-2e-^/^(l--ij + e'^^/^ 

/ / , N 2B/K / , ^ 2{A+B)/K-^ 

k2^-2B/k ( f 1 - 1 j - 2e-^/^ + ( 1 - -i 
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Now using the fact that A,B< K/4 and combining like terms, 

Var[X] < K (^e"^/^ - e-(^+^)/^' ~ 1<) ~ ^ 2e-(^+2^)/^ - e'^^/^) 

+ [^~1BIK _ 2^-{A+2B)/K _ + e-2(A+i?)/K _ ^-2B/K _ ^ 



Each of the positive terms multiplying K above is upper bounded by either 1 or 2, and we have 
Var[X] < 7K. ■ 

Lemma A.21. If S = and 100 < A < /f/20, then Var[X] < AA^/K. 
Proof. By Lemma [A. 171 



Var[X] = K{K-l)\l 



K 



+ K{ 1 



1-1 
K 



1-1 

A' 



2A 



1 



+ A 



AM 1 



1 

A 



A2(1 



a2( 1 



A2( 1 



2 

A 
2 

A 

2 
A 



1-* , 



1-1 

K 



+ K 



2A 



1-1 

A' 



A 



A 



1-1 + 



1-1 + 









)1 


+ A 









A' 



+ ^1 



K 



1-4 + ^2 

A 



2^ ^ 
1 h As 

a: ^ 



where Ai , A2 , and A3 are the sum of quadratic and higher terms of the binomial expansions for 
(1 + /(A'2(l - 2/A)))^, (1 - 1/A")^, and (1 - 2/A)^, respectively. Continuing the expansion, 



Var[X] 



A 



+ Ai + ^ + a:(A2 - A3 



-A 1-- 
K 



-A + 



2(^ - 1) 
A 

2A{A - 1) 
A 



+ A4 - K'Ei 1 



A 



+ A + K{E2 - A3 



^A4-A:'Ai(1- — ) +A + A(A2-A3j 
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2A{A - 1) 
K 



AEa 



K^Ei 



2 

K 



+ K{E2 - Es 



where E4 is the sum of quadratic and higher terms of the binomial expansion of {l — l/K)"^ ^. Since 
10 < A < K/20, we have that E4 is bounded by a geometric series with starting value {2/K)'^{A — 
1)2/2 < 2A{A - l)/K^ <{A- 1)/(5K) and common ratio at most 2{A -l)/K < 2A/K < 1/10, 
and so E^ < {{A - l)/{bK))/{l - 1/10) = 2{A - 1)/{9K). Thus, -AE4 < 2A{A - 1)/{9K). 

Arguing similarly, we see that Ei is at most {A'^)/{K^{1 — A/K"^)) < 2A^ /K"^ for sufficiently 
large K. It follows that 



K^E, ( 1 - - 



< K^E^ <2^< 



1) 



9K 



for sufficiently large K. 

Finally, we look at E2 — E3, 

Eo — E-} = 



K2 ^3 



+ 



i(l)_8(l) 



+ 



This series can be upper bounded by the series ^^^2 ~ — ^^^^/^^ 



00 (2'-l)(yl/i<-)' 



and lower bounded by the series 
This series, in absolute value, is just a geometric series with starting term 



■E- 

3^2/(2^2) and common ratio at most A/K < 1/20. Thus, \E2 

A{A-l)/K = 



follows that \K{E2 
Hence, 



E^ 



< 



30 
19 



AVK < § 



100 



-Eg 
3000 
1881 



< 



20 



19 

■A{A 



M| = § . (A/Kr. It 
- 1)/K, since A > 100. 



\AEi\ + 



K'E, ( 1 - - 



+ \K{E2-E-i 



,2 1 

^'9 + 9 



3000 



1881 / 



A{A - l)/K < 1.93A{A - 1)/K. 



and thus Var[X] < SmA'^/K. U 

Lemma A. 22. There exists some constant eq such that the following holds for e < Eq. Let 7i be 
a family of clog(i^/e)/ log \og{K / E)-wise independent hash functions mapping the A + B good and 
bad balls into K bins for some sufficiently large constant c > 0. Suppose A,B < K/e and A > 1, 
and we choose a random h £ 7i mapping balls to bins. For i G [K], let X[ be an indicator variable 
which is 1 if and only if there exists at least one good ball, and no bad balls, mapped to bin i by 
h. Let X' = J2iLi -^i- Then for a sufficiently large constant c, the following holds: 

1. |E[X'] - E[X]\ < eE[X] 

2. Var[X'] - Var[X] < e'^ 

Proof. Let Ai be the random variable number counting the number of good balls in bin i when 
picking h from TC. Let Bi be the number of bad balls in bin i. Define the function: 
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A(n) = E(-l)'(l) 
1=0 ^ ^ 

We note that /^(O) = 1, /^(n) = for 1 < n < /c and < J otherwise. Let /(n) = 1 

if n = and otherwise. We now approximate Xi as fk{Bi){l — fk{Ai)). We note that this value 
is determined entirely by 2A;-independence of the bins the balls are put into. We note that this is 
also 



The same expression holds for the X^', and thus both E[Xj'] and E[Xj] are sandwiched inside 
an interval of size bounded by twice the expected error. To bound the expected error we can use 
2{k + l)-independence. We have that the expected value of, say, [j^j) is (^^j^) ways of choosing 
/c + 1 of the good balls times the product of the probabilities that each ball is in bin i. This is 



k+l 



+ ly ~ V^(^ + 1) 

and similarly for E[(^:^'J]. Assuming that A,B < K/e, \E\Xi] - E[X|]| < e^/K as long as 6(2(A; + 
< which occurs for k = clog(il'/e)/loglog(iC/e) for sufficiently large constant c. In 
this case |E[X] - E[X']| < < eE[X] for sufficiently smah e since E[X] = when B <K and 
A>1. 

We now analyze Var[X']. We approximate XiXj as fk{Bi)fk{Bj){l-fk{Ai)){l-fk{Aj)). This 
is determined by 4A:-independence of the balls and is equal to 




k + l 



We can now analyze the error using 4(fe + l)-wise independence. The expectation of each term 
in the error is calculated as before, except for products of the form 
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and similarly for Bi,Bj. The expected value of this is 



^k + l,k + lj -\k+lj - \K{k + l) 

Thus, again, if A, i? < K/e and k = c' log(i^/e)/ log log(i^/e) for c' sufficiently large, each 
summand in the error above is bounded by e^/ {32K^), in which case |E[XiXj] — E[XjXj]| < e^/K'^. 
We can also make c' sufficiently large so that |E[X] — E[X']| < e^/K'^. Now, we have 

Var[X'] - Var[X] < |(E[X] - E[X']) + 2^(E[X,X,] - E[X,'Xj]) - {E'[X] - B'[X'])\ 

i<j 

< \E[X] - E[X'] \ + K{K - 1) max \E[XiXi] - E[X'iX'A \ + \E'^[X] - E'^[X']\ 

i<j 



2\2\ 



< e^/K'^ + e'^ + E'^[X]{2e^/K'^ + {e^/K 

< 5e^ 

which is at most for e sufficiently small. I 

Lemma A. 23. There exists a constant eo such that the following holds. Let Ti., X' be as in 
Lemma |A.22[ and also assume S = and 100 < A < K/20 with K = l/e^ and e < Eq. Then 

Prh^nW - E[A]| < 8eE[A]] > 3/4 



Proof. Observe that 



E[X] > il/e') As' +(^^]e' 

> (39/40)A, 

since A < l/(20e2). 

By Lemma[Al2]we have E[X'] > (1 - e)E[X] > (9/10)A, and additionally using Lemma|A2I] 
we have that Var[X'] < Var[A] + < he^ A? . Set e' = 7e. Applying Chebyshev's inequality, 

Pr[|A'-E[A']| > (10/ll)e'E[A']] < Var[A']/((10/ll)2(e')^E2[A']) 

< 5 • AV/((10/ll)2(e')^(9/10)2A2) 

< (13/2)eV(10e7ll)^ 

< 1/4 

Thus, with probability at least 1/4, by the triangle inequality and Lemma lA.221 we have \X' — 
E\X\\ < I A' - E[A']| + |E[A'] - E[A]| < 8eE[A]. ■ 
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A. 4.1 Proofs from Section [5] 

Here we provide the proofs of two lemmas used in the analysis of our Fq algorithm in Section [5l 
Proof (of Lemma 15. 2p . We calculate 
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Proof (of Lemma We use B it eB to denote a value in [(1 — e)B, (1 + e)B]. Then, 
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